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Non-expert nation 


Scientists — just like everybody else — have little idea what will happen now that the United 


Kingdom has voted to exit the European Union. 


buyer regret — the second thoughts that follow the purchase of 

a shiny new car, say — note a curious paradox. The more effort 
that consumers put into making their decision, the more information 
they seek and the more they weigh up the options, the more likely they 
are to want to change their mind later. 

Just how much careful thought the people of the United Kingdom 
put into last week’s decision to quit the European Union is currently a 
matter of some debate. But if the prominent examples of buyer regret 
among people who voted ‘Leave and now want to ‘Remair are any 
guide, it may have been more than many critics think. 

Psychologists might conclude that Kelvin Mackenzie, the former 
editor and now columnist of The Sun newspaper, must have been 
weighing up the options very carefully indeed when he wrote his “10 
reasons why you must vote Brexit” the week before the crucial vote. 
How else to explain his U-turn, a few days after 52% of voters heeded 
his demand, when he admitted: “I have buyer’s remorse. A sense of be 
careful what you wish for. To be truthful I am fearful of what lies ahead.” 

Scientists in the United Kingdom and elsewhere share his anxiety — 
and fear. Hundreds have responded to calls from this journal to express 
their feelings, and the overwhelming question that they have replied 
with is: what happens now? 

UK politicians who pushed for the country to exit the EU have gone 
to ground. A similar silence reigns in the European Commission's 
research directorate. Commission sources mutter darkly, and only off 
the record, of ‘uncharted territories and ‘needing time’ to consider the 
many issues that will arise. UK politicians and the research directorate 
declined to engage before the vote with the ‘what if’ question, at least 
publicly. So it is no surprise that scientists have been left with the feeling 
that no one had planned for the Brexit eventuality. What will be the sta- 
tus of those from other EU countries doing their PhDs or postdoctoral 
research in the United Kingdom? What will happen to the EU-funded 
research collaborations that are led from the United Kingdom? 

What do we know for sure? Some of the most familiar European 
research facilities are not creatures of the EU, so will remain fundamen- 
tally unaffected by Brexit. These include the European particle-physics 
laboratory CERN, the European Molecular Biology Laboratory and the 
European Space Agency. 

More recently, the European Commission has found a way to steer 
the creation of other, much-needed Europe-wide research infrastruc- 
tures through an umbrella structure called ESFRI (European Strategy 
Forum on Research Infrastructures) that helps to foster intergovern- 
mental agreements in which it has no fundamental role. 

Some research infrastructures are based on a particular legal frame- 
work that stipulates that the host country must be a member state. 
For the European Spallation Source, headquartered in Sweden, and 
the Biobanking and BioMolecular resources Research Infrastructure 
headquartered in Austria, nothing changes. For the European Social 


Pp sychologists who have studied the peculiar phenomenon of 


Survey and the structural-biology infrastructure known as Instruct, 
both headquartered in the United Kingdom, Brexit means that new 
arrangements will have to be made; internal talks have already begun. 
Talks on similar agreements for core European Commission scien- 
tific activities won't start until the United Kingdom formally declares 
its exit by triggering the much-discussed 


“Scientists have article 50 of the EU treaty. When (and if) 
beenleft withthe _ that will happen depends on how quickly the 
feeling that no country resolves various questions of its own: 
onehad planned _ notleast, who the next prime minister will be, 
for the Brexit the proper legal route, and the broader consti- 


tutional question of whether it should follow 
through on a democratic decision that seems 
likely to damage the prospects of so many who voted for it. 

If the United Kingdom does trigger article 50, research facilities 
owned by the commission and stationed in the country, such as the 
nuclear-fusion facility JET, face an uncertain future. And until a new 
agreement is made, UK scientists will be shut out of the EU’s multi- 
billion-euro Horizon 2020 programme — including its prestigious 
European Research Council granting body, from which the United 
Kingdom benefits more than any other country, by a wide margin. 

Michael Gove, a senior figure in the Leave camp, notoriously claimed 
during the campaign that the United Kingdom has “had enough of 
experts”. He has got his wish, but he should beware: buyer regret is not 
available to those who did the selling. m SEE NEWS P.597 


eventuality.” 


The big picture 


Interdisciplinary research is vital if we are to 
meet the diverse needs of modern society. 


engagement of multiple disciplines. For two examples, in 
responding to the challenges of climate change and of social 
progress, see the Comment articles on pages 613 and 616, respectively. 
To highlight the issues that arise in such research, imagine an inte- 
grated project to determine the causes of destructive risk-taking in 
inner-city adolescents and to identify appropriate interventions. Such 
a programme might combine disciplines ranging from anthropology, 
sociology, psychology, law, economics and ethics to psychiatry, health 
systems, urban design and developmental neurobiology. 
To frame the research challenge, and to design interventions that 
will be effective in targeted neighbourhoods, academic researchers 
need to work with non-academic partners to understand the needs 


r | Yo tackle society’s challenges through research requires the 
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of the community, the political context and the barriers — structural 
and behavioural — to applying the lessons that might be learned. The 
researchers would also need to learn how colleagues from other disci- 
plines approach the issues and frame the research questions in a mutu- 
ally acceptable way. They must also learn to respect what is possible in 
each discipline, and how insights are gained and possible implementa- 
tions are made. All this is easier said than done, but it is essential. 

Funders must rise to the challenge of supporting these tough 
research necessities. That means having enough of an overview of 
a project to oversee the selection of peer reviewers whose individual 
perspectives will inevitably be narrower than those of the project. An 
ideal funder would also include potential users of the project’s out- 
come among its assessors, to ensure that the research has practical 
impact as well as academic weight. 

The world is ill-equipped to uphold such ideals. For example, a 
paper published in this issue of Nature (R. Bromham et al. Nature 
534, 684-687; 2016) provides evidence that multidisciplinary research 
is less attractive to funders than single-discipline research. The work is 
based on an analysis of grant applications to the Australian Research 
Council, but there is every reason to believe that the conclusion can 
be generalized. The metrics of interdisciplinarity introduced by the 
authors can also serve as warning indicators for funders, telling them 
when they need to take special measures to do a project justice. 

The good news is that many funding agencies are aware of the 
challenge, and of how far they need to go to meet it. The Global 
Research Council (GRC) is a forum in which government funders 
discuss their common challenges. At its annual meeting in Delhi last 
month, the focus was on interdisciplinarity. The council commissioned 
asurvey and analysis of the practices of many funders. It also issued a 


statement of principles on interdisciplinarity (go.nature.com/290mqqt). 

The GRC is not a decision-making body. But it was evident at the 
meeting that the funders recognize the need for new measures. An 
obvious one is that grants should last long enough for interdiscipli- 
nary research to take shape. Another is that funding agencies should 
have a good enough grasp of the subject matter to ensure that a well- 
informed, multidisciplinary assessment can be conducted. 

Journals, too, must face up to such challenges. Nature and its 
research journals take pride in their capacity to handle interdiscipli- 
nary research. The multidisciplinary editorial 


“The good teams see it as part of their job to do so — in 
news 1s that selecting referees from diverse disciplines, 
many funding and in considering their comments within 
agencies are the framing of the paper under discussion, 
aware of the rather than that of the individual assessors. In 


challenge.” sucha context, it is not unknown for Nature’s 
editors to overrule all referees’ recommenda- 
tions against publication of a technically valid paper, and to publish it. 

What is more, the Nature journals are recruiting social scientists 
to address our editorial goal of increasing the attention given to the 
societal challenges of sustainability and health. Nature itself will soon 
be recruiting social-sciences editors. In launching Nature Climate 
Change and Nature Energy, and as we recruit for the launch of Nature 
Human Behaviour next year, we have already learned some impor- 
tant lessons about the sense of professional identity of sociologists, 
anthropologists, economists and psychologists. 

Without that developing sense of respect for diverse types of 
quantitative and qualitative research, progress by funders, publishers 
and universities in interdisciplinary research will founder. = 


Calculated risks 


Gene-therapy trials must move forward, but 
not without due consideration of the dangers. 


gene-therapy experiment. He had a condition called ornithine tran- 

scarbamylase deficiency (OTC), but it was under control through 
a combination of diet and medication. Like others with the disorder, 
Gelsinger lacked a functional enzyme involved in breaking down 
ammonia, a waste product of protein metabolism that becomes toxic 
when its levels become too high. The gene therapy that he received 
used a viral vector to introduce a normal gene for the enzyme. 

Gene therapy remains an obvious route to treat OTC. Simply adding 
the missing gene has been shown to repair metabolism in mice. But the 
memory of what happened to Gelsinger has slowed progress in gene 
therapy for any condition. 

That memory was firmly on the agenda at a meeting of the US 
National Institutes of Health’s Recombinant DNA Advisory Commit- 
tee (RAC) last week. The RAC evaluates proposals to use modified 
DNA in human trials, and presenting to it were Cary Harding, a medi- 
cal geneticist at Oregon Health and Science University in Portland, and 
Sam Wadsworth, chief scientific officer at Dimension Therapeutics 
in Cambridge, Massachusetts. The duo were proposing the first new 
trial of gene therapy for OTC. 

Harding and the researchers at Dimension argue that the technology 
and our understanding of physiology have advanced enough since 1999 
to try it again in people. Gelsinger died after his body overreacted to 
the vector used to introduce the OTC gene. Dimension’s therapy uses 
a different viral vector, called AAV8, which has been tested numerous 
times in people with other conditions, with few adverse effects. 

Such assurances were not enough for the RAC, and particularly not 


Je Gelsinger was 18 and healthy when he died in 1999 during a 
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for its bioethicists and historians. Dawn Wooley, a virologist at Wright 
State University in Dayton, Ohio, pointed out that an RAC panel raised 
concerns about Gelsinger’s trial in 1995, but decided to let the test go 
ahead. “We can't let it happen again, we cannot,’ she says. 

Perhaps the greatest indication of how Gelsinger’s death haunts the 
RAC came when one member suggested that the researchers explain 
in the consent form to be sent to prospective participants that someone 
had died ina similar study and attracted media attention. 

There are some scientific reasons to be careful. AAV8 can cause mild 
liver toxicity in healthy people, and the steroids used to treat that could 
lead to complications in people with OTC. With so little known about 
these effects, the RAC members suggested that the researchers lower 
the dose to one that is more likely to be safe, even if it is potentially 
not effective. 

After some discussion, the RAC voted unanimously to approve the 
trial. However, that came with a long list of conditions, including that 
the treatment first be tested in a second animal species. The research- 
ers disagree with most of the conditions, believing that more expensive 
animal trials will add nothing. They feel that they are being held to a 
different standard from most trials. 

Dimension still plans to submit an application to the US Food and 
Drug Administration (FDA) later this year to start a clinical trial. It is 
unclear how heavily the RAC’s recommendations weigh into FDA deci- 
sions, but Wadsworth says that the company will conduct its trials over- 
seas if necessary. “These patients have been waiting a long time,’ he says. 

He is right. Therapies can be tested in non-human animals only 
for so long — at some point, volunteers such as Gelsinger must step 
forward. Yet the echoes ofa trial done 17 years ago cannot be easily 
silenced. In fact, Gelsinger’s name came up several times at the RAC 
meeting. Researchers from the University of Pennsylvania in Philadel- 
phia had even mentioned him earlier that morning, when proposing 
the first human trial of CRISPR gene-editing technology as a treatment 
for cancer. The RAC approved that proposal, but its implication was 
clear: take care. Avoidable failures could stymie CRISPR research for 
decades. History must not repeat itself. m 
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y eight-year-old son came home from school disappointed 
M last week. When asked the test question “How can we save 

the environment from pollution?”, he had tried to write the 
answer in his own way. This did not go down well with his teacher, who 
cut his mark and asked why he had not repeated the answer as it was 
printed in the textbook. That’s common practice in India. To get top 
marks, school children must learn and regurgitate answers presented 
to them. With such a culture, is it any wonder that plagiarism and 
unoriginal thinking are so prevalent in Indian science and research? 

We should all be disappointed with my son’s experience at his 
school. And India currently has a rare, possibly once-in-a-lifetime 
opportunity to sort it out. A major review of the nation’s education sys- 
tem has made several recommendations to the government, which has 
so far not published them. Scientists and others 
are now waiting for the government to say what 
it will do. 

The education system is the best place to start 
to improve Indian science. Many of the problems 
that hold back Indian research are set in motion 
when researchers are in school and univer- 
sity. Science, they are told and shown, is about 
answering questions, not asking them. Even at 
university level, we are taught to learn from the 
class notes written by the teachers on the board, 
who themselves copy it from a book, and to 
answer in the same way in the examination. 

This slack attitude goes right to the top. Suc- 
cessful Indian grant applications often copy text 
from grants submitted in other countries. Anda 
2010 report on genetically modified crops pre- 
pared by officials from six Indian science acad- 
emies simply cut-and-pasted text from a previous publication. India 
doesn't take the offence seriously. Researchers who are shown to have 
committed plagiarism — which is serious misconduct, and enough 
in many countries to end a career — are typically given only a note of 
instruction not to do it again. 

India had a rich education system in the past, which gave the world 
many influential thinkers and writers. The coming school reform 
must attempt to reinstate the once-prized qualities of innovation and 
discovery. Perhaps this change will also help to kick-start interest in 
the fundamental sciences, which have become less popular in recent 
years as students switch to applied sciences, medicine and commerce. 

Higher-education institutions can make changes that will have amore 
immediate impact on Indian science. They must take a harder line on 
plagiarism by setting and enforcing rules and by introducing ethics 
classes to show students that the practices that they learned in school 
are no longer acceptable. And the government must demand that uni- 
versities introduce more and stricter measures to guarantee the standard 
of the degrees, and especially the postgraduate qualifications they issue. 


MANY OF THE 
PROBLEMS THAT 


HOLD BACK 
INDIAN RESEARCH 
ARE SET IN MOTION 
WHEN RESEARCHERS 


AREIN SCHOOL 


AND UNIVERSITY. 


Stop teaching Indians 
to copy and paste 


Major reform of education in India should encourage original thinking 
to boost the nation’s research, argues Anurag Chaurasia. 


Poor standards explain why Indian universities rarely feature, and 
sometimes arent included at all, in league tables of international insti- 
tutions. (The 2015-16 Times Higher Education World University 
Rankings do not include a single Indian university in the top 200.) 
This is unacceptable for a nation of India’s size and ambition. 

Some of the best institutions in the country have taken a few steps 
to improve quality — they insist that a PhD project must produce 
two papers in international journals, and that a thesis is reviewed 
by a foreign expert. But these measures are too easily circumvented. 
PhD students simply pay to publish in a low-quality, open-access 
journal and send their thesis to a friend. We need stricter definitions 
of who can publish and review work that will grant a young Indian 
scientist a ticket to academia. 

This is especially true for the private educa- 
tion institutions and publishers that are rapidly 
emerging, and that are polluting Indian science 
and scientific literature just to make money. 
These institutions charge students to complete 
low-grade PhDs and to publish poor work — a 
move encouraged by government officials who 
want to give private education more autonomy. 
Moves to allow foreign institutions to establish 
campuses in India must be closely regulated if 
they are not to make the situation worse. 

There are at least some welcome attempts 
under way to improve journal quality. The Uni- 
versity Grants Commission has asked experts to 
produce a list of approved journals in which aca- 
demics must publish to earn points in the Indian 
system that is used to judge performance and 
award promotions. This idea should be extended 
to include papers that are published as part of a PhD programme. 

The final change that the education reform can bring about for 
Indian science is to alter the selection and attitudes of scientists who 
make it to tenured positions. At present, too many see science as a 
route to a stable career in administration. They want to leave the 
laboratory at the earliest possible opportunity — perhaps because 
they have never learned the true nature and satisfaction of a research 
job well done. In my 20-year scientific career, I have rarely seen any 
researchers who wish to work in the lab instead of opting for a desk job. 
Most of the best Indian scientists initially did very well at the bench 
but soon went into administration, losing their talent in the office files. 

I don't know whether my son will want to be a scientist. But if he 
does, I want him to be a true scientist — and in India, that will demand 
big changes in the way that he is taught. m 


Anurag Chaurasia is a biotechnologist with the National Bureau of 
Agriculturally Important Microorganisms in Kushmaur, India. 
e-mail: anurag_vns1@yahoo.co.in 
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RESEARCH HIGHLIGHTS 


Tobacco plants 
make malaria drug 


Inexpensive fast-growing 
plants have been transformed 
into factories that churn out an 
important antimalarial drug. 
Artemisinin is the only 
proven malaria treatment, 
with hundreds of millions 
of doses taken every year. 
The sweet wormwood plant 
(Artemisia annua) produces 
a precursor of the compound, 
artemisinic acid, only in low 
quantities, and is expensive to 
grow. To scale-up production, 
a team led by Ralph Bock at 
the Max Planck Institute of 
Molecular Plant Physiology 
in Potsdam-Golm, Germany, 
inserted genes for artemisinic 
acid synthesis into tobacco 
plants’ chloroplasts — 
abundant organelles that have 
their own DNA. By adding 
‘accessory genes’ that make 
artemisinic acid production 
more efficient, they created 
a line that pumps out 
120 milligrams of artemisinic 
acid per kilogram of biomass. 
The researchers estimate 
that the world’s demand for 
the drug could be met with 
just 200 square kilometres 
of tobacco fields — an area 
smaller than the city of Boston. 
eLife 5,e13664 (2016) 


The likely root of 
night vision 


The cells in the retina that 
enable night vision may 
have evolved from those that 
sense colour. 

Typically, most of the light- 
sensing cells in mammalian 
retinas are rod cells, which are 
sensitive in low light. However, 
vertebrate ancestors only had 
cells resembling cones, which 
function under bright light 
and can discriminate colour. 


EVOLUTIONARY BIOLOGY 


Scales and fur have shared origin 


Mammals, birds and reptiles inherited key cell 
structures that give rise to their fur, feathers and 
scales from a shared reptilian ancestor. 
Scientists have long debated whether these 
skin appendages evolved independently or had 
a single origin. To find out, Nicolas Di-Poi and 
Michel Milinkovitch at the University of Geneva 
in Switzerland studied skin development in 
embryos of Nile crocodiles (Crocodylus niloticus; 
pictured right), corn snakes (Pantherophis 
guttatus) and bearded dragons (Pogona 
vitticeps). They found that reptilian scales, like 


Ted Allison at the University 
of Alberta in Edmonton, 
Canada, Anand Swaroop at the 
US National Eye Institute in 
Bethesda, Maryland, and their 
co-workers studied mouse rod 
and cone cells, and monitored 
these cells in the developing 
mouse retina. They found that 
early in rod cells’ development 
the cells expressed key genes 
that are normally active in ‘S’ 
(blue) cones. Zebrafish rods, 
however, did not. 

The adaptation of cone cells 
to function under low light 
may have allowed mammals 
to adopt nocturnal lifestyles 
during mammalian evolution. 
Dev. Cell 37, 520-532 (2016) 
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Nanopores harvest 
wasted heat 


A membrane with nanometre- 
sized pores can capture 

low levels of heat energy to 
generate power. 

Industrial plants are 
abundant sources of waste 
heat, but the relatively small 
temperature difference 
between the source (which 
is usually below 100°C) and 
its surroundings makes it 
hard to exploit. Menachem 
Elimelech of Yale University 
in New Haven, Connecticut, 
and his colleagues used a 
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feathers and mammalian hair (mouse embryo 
pictured left), develop from a group of cells 
called the anatomical placode (pictured as dark 
blue spots). These appear only briefly during 
development in snakes and lizards, and were 
previously not detected and so thought to be 
missing in these animals. 

These cells express the same developmental 
genes as bird and mammalian placodes, 
suggesting a common origin for modern hair, 
feathers and scales. 

Sci. Adv. 2,e1600708 (2016) 


water-repellant membrane 
that traps air in its pores and 
placed it between hot and 
cold water streams, creating 
a tiny air gap between the 
streams. The hot water 
evaporates on one side of the 
membrane, passes through 
the pores and condenses in 
the cold stream, creating 
hydraulic pressure that drives 
a turbine. 

With a heat source at a 
temperature of only 60°C, 
the device transferred power 
densities of up to 3.5 watts per 
square metre to a 20°C fluid. 
Nature Energy http://dx.doi. 
org/10.1038/nenergy.2016.90 
(2016) 


A.C. TZIKA, N. DI-POI & M. C. MILINKOVITCH 


W. S. ¥. WONG ETAL. SCI. ADV. 2, E1600417 (2016) 


L. ALMELING 


IMMUNOLOGY 


Insect bites make 
viral disease worse 


Bites from mosquitoes that 
spread viruses trigger a distinct 
immune response in the skin 
after they bite, which increases 
the severity of infection caused 
by the transmitted virus. 

Clive McKimmie at the 
University of Leeds, UK, and 
his colleagues injected mice 
with one of two mosquito- 
borne viruses. Mice that were 
bitten by virus-free mosquitoes 
and then injected with the 
microbe showed an immune 
response that retained more 
virus at infection sites than 
did infected mice that had 
not been bitten. Immune 
cells called neutrophils were 
drawn to the bite, where they 
enhanced the virus’ ability to 
infect and multiply, causing 
more-severe disease. 

Blocking certain immune 
cells from reaching the site 
of an insect bite reduces viral 
replication and could be a way 
to diminish disease after a bite, 
the authors say. 

Immunity 44, 1455-1469 (2016) 


BEHAVIOUR 


Older monkeys 
socialize less 


Like humans, some monkeys 
show declining social activity 
with age. 

Laura Almeling at the 
German Primate Center in 
Géttingen, Germany, and her 
colleagues studied Barbary 
macaques (Macaca sylvanus; 
pictured), and found that 
older females spent less time 
grooming others and interacted 
with fewer animals than 


younger individuals did. These 
changes were not explained by 
an overall reduction in social 
interest, as older males and 
females maintained an interest 
in pictures of other animals. 
Moreover, the monkeys’ 
interest in toys and other non- 
social objects decreased in early 
adulthood, mirroring humans’ 
declining eagerness for new 
experiences with age. 

People’s tendency to shrink 
their social circles as they age 
has previously been attributed 
to a sense that time is growing 
short, but the results in 
monkeys suggest that it may 
also be rooted in primate 
evolution, the authors say. 
Curr. Biol. http://dx.doi.org/ 
10.1016/j.cub.2016.04.066 
(2016) 


PLANT BIOLOGY 


African trees cope 
with warming 


Some trees in Africa already 
seem to be adapting to the 
warming climate by using 
water more efficiently. 

Iain Robertson of Swansea 
University, UK, and his 
colleagues collected a small 
number of samples from 
three tree species in Ethiopia, 
Namibia and South Africa, 
covering a small area of the 
continent. By measuring the 
ratio of carbon isotopes in each 
tree ring, the team estimated 
the water-use efficiency of the 
trees from 1909 to 2003. They 
found that two of the three 
species increased their water- 
use efficiency — by an average 
of 25% — over the period. 

Using water more sparingly 
may help to compensate for the 
predicted decreases in rainfall 
in Africa, allowing some plants 
to cope better with climate 
change than others. 

J. Quaternary Sci. 31, 386-390 
(2016) 


T cells target solid 
tumours 


A cancer therapy that uses 
genetically modified versions 
of patients’ immune cells to 


RESEARCH HIGHLIGHTS Biswas 


treat blood cancers has been 
adapted to attack solid human 
tumours implanted into mice. 

Engineered T cells are 
designed to home in on 
specific proteins on the surface 
of cancer cells in the blood 
— but adenocarcinomas, a 
common type of solid tumour, 
rarely carry such markers. 
Avery Posey and Carl June of 
the University of Pennsylvania 
in Philadelphia and their 
colleagues developed a way to 
modify human T cells so that 
they recognize abnormal forms 
ofa sugar molecule linked to 
a cell-surface protein that is 
abundant in many cancers. 
The authors found that in 
amouse model of human 
pancreatic adenocarcinoma, 
all animals treated with these 
T cells survived until the end 
of the experiment, compared 
with only 40% of untreated 
controls. 

Protein-linked sugars are 
a promising target for cancer 
immunotherapy, the team says. 
Immunity 44, 1444-1454 (2016) 


When pupfish got 
to Devils Hole 


Arare fish species living in an 
isolated cavern pool probably 
originated when the cavern 
first opened to the surface 
around 60,000 years ago. 

The Devils Hole pupfish 
(Cyprinodon diabolis) is one of 
the world’s rarest animals, and 
researchers debate whether 
humans introduced the fish 
to the pool in Devils Hole 
in the southwestern United 
States between 20,000 and 
10,000 years ago. Ismail Saglam 
and Michael Miller at the 
University of California, Davis, 
and their colleagues analysed 
the genomes of the fish and two 
related pupfish species, and 
concluded that the Devils Hole 
pupfish became an isolated 
population in the cavern 
roughly 60,000 years ago. 

A geological event may have 
both opened up the cavern and 
introduced the pupfish into it, 
the authors suggest. 

Mol. Ecol. http://doi.org/bj78 
(2016) 


Siw 
” 


MATERIALS 


Self-folding 
mimosa mimic 


A bilayered material can 
curl itself into a cylinder 

in response to a stimulus, 
mimicking the leaves of the 
plant Mimosa pudica, which 
quickly fold up when lightly 
touched (pictured). 

Zuankai Wang at the City 
University of Hong Kong, 
Antonio Tricoli at Canberra's 
Australian National University 
and their team were inspired 
by the plant. They adhered a 
hydrophobic layer, polyvinyl 
chloride, to a hydrophilic one, 
polycaprolactone, then placed 
this bilayer on a flexible plastic 
substrate before cutting the 
resulting trilayer into a long, 
thin strip. When they placed 
a water droplet on one end of 
the hydrophilic side, the two 
sides of the strip quickly peeled 
away from the substrate and 
wrapped around the droplet. 
As the water spread down the 
strip, the bilayer’s edges curled 
with it to form a tube. 

Such a material, which can 
be cut into different shapes, 
could one day be useful in 
sensors and other devices that 
don't require power. 

Sci. Adv. 2,e1600417 (2016) 
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SEVEN DAYS nescnnss 


| RESEARCH 
CRISPR human trial 


The first human therapy to 
involve the CRISPR-Cas9 
gene- editing technology passed 
a major hurdle on 21 June, 
when a federal advisory panel 
at the US National Institutes 
of Health approved a proposal 
to use the technique to edit 

T cells, a type ofimmune 

cell, taken from people with 
cancer. The trial, which would 
be run by the University of 
Pennsylvania in Philadelphia, 
would simultaneously enhance 
the T cells’ ability to destroy 
cancerous cells and protect 
them from being attacked 

by the cells. US regulators 
have yet to approve the trial. 
See page 590 and go.nature. 
com/28qkj6m for more. 


FACILITIES 


Olympic doping lab 
The World Anti-Doping 
Agency (WADA) has 
suspended the accreditation 
of the laboratory in Rio de 
Janeiro, Brazil, that was to 
have handled anti-doping tests 
of urine and blood samples 
from athletes at the city’s 
upcoming Olympic Games. 
WADA announced on 24 June 
that the facility had failed to 
conform with its international 
laboratory standards — but 
did not specify why. Brazil has 
had previous such troubles: 

it lacked a WADA-accredited 
lab for the Rio-hosted 2014 
football World Cup. Football’s 
governing body FIFA decided 
to fly samples to a lab in 
Switzerland for testing. 


EVENTS 


Brexit shock 

The United Kingdoms vote to 
leave the European Union in 
a referendum on 23 June has 
left researchers scrambling 

to protect their scientific 
relationships and funding 


Two workers rescued in Antarctic mission 


Two ill crew members were evacuated from 
the US Amundsen-Scott South Pole Station on 
22 June. A Twin Otter aeroplane operated by 
Kenn Borek Air of Calgary, Canada, travelled to 
the pole after stopping at Britain’s Rothera station. 
It was only the third midwinter flight ever made 


streams. In a surprise to many 
observers, 52% of voters 
chose to leave the EU. In the 
run-up to the referendum, a 
number of senior academics 
and research organizations 
(and Nature) had voiced fears 
that a vote to leave would be 
highly disruptive to science. 
See pages 589 and 597 for 
more. 


Coral crisis 

More than 2,500 coral-reef 
scientists, policymakers and 
stakeholders have written 

to the Australian prime 
minister demanding that the 
government stop approving 
new coal mines, because 
climate change is the major 
threat to reef ecosystems. The 
letter, sent on 25 June after 
last week’s International Coral 
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Reef Symposium in Honolulu, 
Hawaii, notes that Australia’s 
Great Barrier Reef has been 
devastated by bleaching this 
year. Reef bleaching around 
the world will worsen as 
global temperatures rise. 

The signatories say that the 
government should “stop 
endorsing the export of coal” 
and halt plans for controversial 
mines in Queensland. 


India space record 
India’s space agency set a 
record on 22 June by launching 
20 satellites into orbit ina 
single mission — the biggest 
number in the agency’s history. 
Its previous record for a single 
launch was ten satellites. The 
payload, which launched from 
a site in the eastern state of 
Andhra Pradesh, included 
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to the pole, following medical evacuations in 
2001 and 2003. The National Science Foundation, 
which oversees US research at the pole, did not 
release the names or conditions of the patients; 
both were flown to southern Chile and onwards 
to receive medical treatment. 


13 satellites from the United 
States. The achievement 
brings the agency’s delivery 
rate closer to those of NASA 
and Roscosmos, and cements 
India’s place as a major player 
in the space industry. 


Chinese rocket 
China's new Long March 7 
rocket made a successful 
maiden flight on 26 June. 

The rocket, which launched 
from Hainan Island, is 
eventually intended for use in 
transporting cargo and people 
to anew Chinese space station 
planned for 2022. It uses a 
kerosene and liquid-oxygen 
fuel, which is less toxic than 
propellants of older Chinese 
rockets. The launch delivered 
several satellites to low-Earth 
orbit. 
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* Trachea surgeon 


Controversial surgeon Paolo 
Macchiarini, who pioneered 
transplants of artificial 
windpipes seeded with 
patients’ own stem cells, is 
facing preliminary charges of 
involuntary manslaughter in 
connection with two patients 
who died after surgery, public 
prosecutors in Stockholm 
announced on 22 June. 
Macchiarini is also suspected 
of causing grievous bodily 
harm to another transplant 
patient and to a patient 
undergoing a different type 
of operation, they said. In 
March, Macchiarini was fired 
from the Karolinska Institute 
in Stockholm — where he had 
worked since 2010 — after 
allegations of clinical and 
scientific misconduct. No 
formal charges have been 
brought and Macchiarini 
denies any wrongdoing. 


Helen Edwards dies 
Physicist Helen Edwards, 

a driving force behind the 
Tevatron particle accelerator 
at Fermilab near Chicago, 
Illinois, died on 21 June, aged 
80. Edwards (pictured) led 
the design and construction 
of the Tevatron, which began 
smashing together protons 
and antiprotons in 1985; a 
decade later, observations 

of these collisions resulted 

in the discovery of the top 


TREND WATCH 


A Chinese computer tops the 
list of the world’s 500 fastest 
supercomputers, for the seventh 
consecutive time. The leading 
machine, Sunway TaihuLight at 
the National Supercomputing 
Centre in Wuxi, can make 

93 quadrillion calculations per 
second. It is almost three times 
as powerful as the previous list’s 
winner, Tianhe-2, also in China. 


For the first time, China overtakes 


the United States in number of 
supercomputers in the biannual 
TOP500 ranking. It had just one 
machine on the list until 2000. 


quark. Edwards also worked 
on accelerator designs for 
future high-energy-physics 
machines. The Tevatron 
closed in 2011. 


AWARDS 


Blavatnik awards 
The three winners of this 
year’s US Blavatnik Awards 
for Young Scientists were 
announced on 21 June. David 
Charbonneau at Harvard 
University in Cambridge, 
Massachusetts, was honoured 
for his work on observational 
astronomical methods used to 
search for chemical signatures 
of life in space. Phil Baran at 
the Scripps Research Institute 
in La Jolla, California, won 
for his research on the use of 
chemical synthesis to design 
scalable, efficient routes to 
potential new drugs. Michael 
Rape at the University of 
California, Berkeley, was 
rewarded for his discoveries 


in cellular signalling involving 
the protein ubiquitin. Each 
person receives US$250,000 
— the largest unrestricted 
cash prize for early-career 
scientists. The prizes are 
awarded annually by the 
Blavatnik Family Foundation 
and the New York Academy 
of Sciences. 


POLICY 


Looser drone rules 
The United States has markedly 
relaxed its rules that govern the 
use of small drones, clearing 
the way for commercial — and 
many scientific — applications. 
The policy, announced by 

the White House on 21 June, 
had been under development 
at the Federal Aviation 
Administration (FAA) for 
years. Many scientists had 
been unable to use drones 

for research because the 
machines could not be flown 
for ‘commercial’ use, which 
included research and teaching 
activities at private universities. 
The latest rules, which apply 

to drones weighing less 

than 25 kilograms, require 
commercial operators to be 
certified with the FAA. Drones 
must be kept within the line 

of sight. 


Chemical control 


Long-awaited reforms to 

US chemical regulations were 
signed into law on 22 June 

by President Barack Obama. 


SUPERCOMPUTER SUPERPOWER 


China has ended the United States’ dominance in supercomputing, 
overtaking it in number of machines, as well as speed. 
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SEVEN DAYS | THIS WEEK | 


7 JULY 

A Soyuz rocket launches 
to take Anatoly 
Ivanishin, Kate Rubins 
and Takuya Onishi to 
the International Space 
Station. 


27 JUNE-2 JULY 
The Starmus festival 
in the Canary Islands, 
Spain, brings together 
astronomy, art and 
music with speakers 
including Brian May, 
Stephen Hawking and 
Brian Eno. 
www.starmus.com 


The update to the 1976 Toxic 
Substances Control Act 
gives the US Environmental 
Protection Agency greater 
authority to ensure the safety 
of chemicals — both old and 
new. Under the revised law, 
the agency can request more 
information from chemical 
manufacturers and even 
compel firms to conduct 
extra safety studies. Several 
previous attempts to overhaul 
the law had failed over the 
past decade. 


NASA travel ban 
NASA has effectively banned 
its employees and contractors 
from attending a major 
space-research conference 
that begins in Istanbul, 
Turkey, on 30 July, citing 
security concerns. An internal 
memo dated 21 June reports 
that NASA head Charles 
Bolden made the decision 

to not sponsor or process 
travel to the Committee on 
Space Research (COSPAR) 
assembly in line with travel 
warnings issued by the US 
state department. Lennard 
Fisk, a space scientist at the 
University of Michigan in 
Ann Arbor and president of 
COSPAR, decried the decision 
as giving in to terrorist threats. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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The UK vote to leave the European Union has sparked huge uncertainty across the continent. 


UK scientists in limbo 
after Brexit shock 


Researchers organize to lobby for science as country prepares for life outside the EU. 


BY ALISON ABBOTT, DANIEL CRESSEY AND 
RICHARD VAN NOORDEN 


r | The dust from last week’s vote by the 
United Kingdom to leave the European 
Union is nowhere near settled, but the 

country’s researchers are already bracing for 

the fallout. 
On 23 June, 52% of those who voted in the 
country’s referendum came out in favour of 


leaving the EU. No one is sure how ‘Brexit’ will 
affect science, but many researchers are wor- 
ried about long-lasting damage. Beyond the 
immediate economic impacts and the poten- 
tial loss of EU funding — which currently 
supplies some 16% of UK university research 
money — scientists fear a loss of mobility 
between the country and the continent. 

“T was on a career panel only yesterday, sing- 
ing the praises of the UK as a wonderful place 
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of opportunity for young scientists, and I feel 
like that has changed overnight,’ said Vanessa 
Sancho-Shimizu, an infectious-diseases 
researcher at Imperial College London, in 
response to a Nature survey last Friday. She is 
a Spanish national and one of many scientists 
who expressed similar views. 

Researchers are already mobilizing to lobby 
for the United Kingdom to remain a partici- 
pant in EU science programmes, and for 
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> domestic funding to make up 
any shortfalls. “We need some 
kind of rapid monitoring to catch 
fallout problems early and imple- 
ment remedial measures,” says 
Mike Galsworthy, who led the 
Scientists for EU campaign. 

“If the science community 
wants to have an impact on the 
UR’s negotiation strategy, it needs 
to clearly know what its own pri- 
orities are and start the process of 
making that case, strongly,’ says 
John Womersley, chief executive 
of the UK Science and Technol- 
ogy Facilities Council. Getting a 
guarantee to remain part of Hori- 
zon 2020, the EU’s €74.8-billion 
(US$82.9-billion) programme 
of research grants, should be the 
community's top — and only — 
objective, he adds. 

Jamie Martin, an independent 
education consultant who advo- 
cated for Brexit, offers “total reas- 
surance” to worried scientists. 
Most academic groups had lob- 
bied for the United Kingdom to remain in the 
EU. Martin says that “the good news for them 
is that the people at the top of the Vote Leave 
campaign share their instincts on science”. 
This includes being open to skilled people 
from other countries and understanding the 
importance of continued funding, he says. 


PEOPLE 

Exactly when the United Kingdom will leave 
the EU is unclear. There is no set date for the 
government to invoke ‘article 50’ of the EU 
Lisbon treaty, but once it does, it will trigger 
a process of negotiation that must conclude 
within two years. Campaigners for a Leave 
vote — including former London mayor Boris 
Johnson, whom many expect will lead the next 
government — have said that there is no need 
to do this immediately, and informal negotia- 
tions with the rest of the EU can take place first. 

Those in favour of Brexit say that a United 
Kingdom outside the EU could allow in more 
skilled researchers while still driving down 
overall immigration numbers. ‘Leave’ cam- 
paigners have advocated a points-based immi- 
gration system such as Australia’s, which would 
attempt to level the playing field between EU 
and non-EU researchers. 

But it’s unclear whether the United Kingdom 
will still be attractive to talented researchers. 
Some have said that they feel less welcome in 
the country as a result of both the vote and 
the campaign leading up to it, which featured 
highly charged rhetoric around immigration. 


MONEY 

Even laboratories staffed primarily by UK 
nationals could feel the pinch. EU research 
funds have supplied an estimated €8 billion 
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Mike Galsworthy wants careful monitoring of UK research to spot any fallout. 


to the country over the past decade. 

The United Kingdom is also by far the 
largest recipient of loans to EU universities 
and research institutions from the European 
Investment Bank (EIB), receiving more than 
€2.8 billion since 2005 — some 28% of total 
EIB loans for higher education and research 
over that period. Agreed loans are secure, but 
the fate of those that are just beginning to be 
considered is unclear, says EIB spokesman 
Richard Willis. 

Leading campaigners for the Leave side 
pledged before the vote that universities and 
scientists in the United Kingdom who now get 
funding from the EU “will continue to do so”. 

The country 
could try to nego- 
tiate access similar 
to the agreements 
that 15 other non- 
EU countries currently hold within Hori- 
zon 2020. But that might not be possible if 
the country acts to restrict free movement 
of people, as many Leave supporters have 
demanded. Switzerland, a non-EU member, 
is an associated country, but its researchers 
were cut out of full access to Horizon 2020 
after the nation voted in a 2014 referendum 
to restrict immigration. 

“The long-term future worries the hell out 
of me,” says Steven Cowley, who directs the 
Culham Centre for Fusion Energy in Abing- 
don, UK. The centre operates the Joint Euro- 
pean Torus (JET), a nuclear-fusion facility, 
on behalf of the European Commission. The 
contract for JET runs out in 2018, but Cowley 
says he is confident that it will be extended, 
because it provides crucial expertise for 
ITER, the international fusion experiment 


“The long-term 
future worries the 
hell out of me.” 
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under construction in southern 
France. The real problem, he 
says, is that the United Kingdom 
will not be able to compete to 
host the next major European 
facility. 

As for ITER itself, the EU is 
one of seven major international 
members of the project. The 
United Kingdom will have to 
rejoin it, either as an individual 
nation member — which would 
mirror its membership of CERN, 
the European particle-physics 
lab — or perhaps with ‘associate 
member’ status similar to that 
held by Switzerland. 


POLICY 

A UK exit from the EU could also 
reshape the policy landscape for 
the countries that remain in the 
bloc. 

Germany, Italy and Austria 
are among the nations that have 
opposed EU funding for research 
on human embryonic stem cells. 
Others, including the United Kingdom and 
Sweden, called for research to be funded 
under appropriate ethical oversight — lead- 
ing to a deal in which research collabora- 
tions can be funded as long as partners from 
countries where the research is forbidden 
do not handle human embryonic stem cells 
themselves. The United Kingdom was “in the 
forefront of guiding us into an acceptable and 
workable way around the issues’, says stem- 
cell researcher Christine Mummery of the 
Leiden University Medical Center in the 
Netherlands. “If the UK cannot participate 
in decisions like this, it makes me nervous.” 

Other European scientists fear for the future 
of their own countries’ science bases if the UK 
vote empowers other anti-EU movements. 
Right-wing populist politicians in France, the 
Netherlands and Denmark are already calling 
for their own referendums. 

James Wilsdon, a science-policy researcher 
at the University of Sheffield, UK, says that 
beyond the questions about continued access 
to EU funding and policy, there is a more 
fundamental issue that UK researchers must 
come to grips with: the fact that most academic 
experts, research lobby groups and other 
experts came out in favour of staying in the 
EU and were ignored by the public. 

“Here you have such a major question 
around which there was such a torrent of 
solid analysis and empirical evidence, and 
we've had a rejection of that by 52% of the 
public,” he says. “That needs to provoke 
some serious soul searching and reflection.” m 
SEE EDITORIAL P.589 


Additional reporting by Davide Castelvecchi 
and Elizabeth Gibney. 
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MISSION TO JUPITER 


NASA's Juno spacecraft comes 
armed with a suite of instruments 
that will measure auroras, map 
magnetic fields and dig deep into 
the planet’s atmosphere. 
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NASA spacecraft 
nears Jupiter 


Juno will explore the gas giant’s composition and mysteries. 


BY ALEXANDRA WITZE 


n4 July, NASA intends to finish a job 
() that started with the agency’s Galileo 

mission 21 years ago. At 8:18 p.m. 
Pacific time, the Juno spacecraft will ignite its 
main engine for 35 minutes and nudge itself 
into orbit around Jupiter. Ifall goes well, it will 
eventually slip into an even tighter path that 
whizzes as close as 4,200 kilometres above the 
planet’s roiling cloud-tops — while dodging as 
much of the lethal radiation in the planet’s belts 
as possible. 

The US$1.1-billion mission, which 
launched in 2011, will be the first to visit the 
Solar System's biggest planet since NASA's 
Galileo spacecraft in 1995. Picking up where 
Galileo left off, Juno is designed to answer 
basic questions about Jupiter, including what 
its water content is, whether it has a core and 
what is happening at its rarely seen poles (see 
‘Mission to Jupiter’). 

Scientists think that Jupiter was the first 
planet to condense out of the gases that 
swirled around the newborn Sun 4.6 billion 
years ago. As such, it is made up of some of the 
most primordial material in the Solar System. 
Scientists know that it consists mostly of 
hydrogen and helium, but they are eager to 
pin down the exact amounts of other elements 
found on the planet. 

“What we really want is the recipe,’ says Scott 


Bolton, the mission’s principal investigator and 
a planetary scientist at the Southwest Research 
Institute in San Antonio, Texas. 


A MURKY DISPOSITION 

Jupiter's familiar visage, with its broad brown 
belts and striking Great Red Spot, represents 
only the tops of its churning clouds of ammo- 
nia and hydrogen sulfide. Juno — named after 
the Roman goddess who could see through 
clouds — will peer hundreds of kilometres 
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into the planet's atmosphere using microwave 
wavelengths. 

Exploration of Jupiter’s interior should 
reveal more about the formidable atmospheric 
convection that powers the planet, says Paul 
Steffes, an electrical engineer at the Georgia 
Institute of Technology in Atlanta. 

Steffes and his colleagues have runa series 
of laboratory experiments to simulate what 
different layers of Jupiter’s atmosphere might 
look like: from near the cloud-tops, where 
experimental temperatures are -100°C to 
deeper in the planet, where they rise to more 
than 300°C. 

By comparing Junos observations to their 
simulations, the scientists hope to determine 
how much ammonia, water vapour and other 
materials swirl at different atmospheric depths. 
“Once we understand the recipe for Jupiter’s 
atmosphere, we'll get a clearer insight into 
how it evolved,’ says Steffes. Different theories 
predict varying amounts of water in Jupiter's 
atmosphere, depending on whether the planet 
coalesced at its current distance from the Sun 
or somewhere else. Actual measurements of 
atmospheric water content could help to clarify 
this debate. 


NORMAL IS GOOD 

In anticipation of Juno’ arrival, professional 
and amateur astronomers have been observing 
Jupiter with ground-based and space-based 
telescopes. For now, the planet is not experi- 
encing any unusual atmospheric changes. “It's 
kind of in its normal state, which is good, says 
Amy Simon, a planetary scientist at NASA‘s 
Goddard Space Flight Center in Greenbelt, 
Maryland. This ‘normal’ behaviour gives 
researchers confidence that they will be able 
to understand Juno’s findings. 

The Great Red Spot continues to shrink, 
as it has done in recent years, and to interact 
less and less with the jet streams on either of its 
edges. The broad belt just north of the planet's 
equator has been expanding since late 2015 > 


Jupiter’s Great Red Spot also reveals a mosaic of currents that swirl through the planet’s atmosphere. 
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> —achange that might be connected to 
processes deep in the atmosphere. 

“Trying to connect events that are 
happening at one level to events happening 
in another tells you how well coupled the 
whole atmosphere is,” says Leigh Fletcher, 
a planetary astronomer at the University of 
Leicester, UK. 

As Juno probes deeper and deeper into 
the planet’s atmosphere, researchers hope 
to get information on a layer of hydrogen 
compressed into a liquid by increasing 
pressures. That liquid conducts electricity, 
which powers Jupiter’s enormous magnetic 
field. Deeper still, the spacecraft will look 
for evidence of a core — a dense nugget of 
heavier elements that most scientists think 
exists, but has never been observed. Juno will 
make precise measurements of how Jupiter's 
gravity tugs on the spacecraft, which should 
reveal whether a core is present. 


POLE POSITION 

Juno will also get an unprecedented glimpse 
of Jupiter’s poles. To avoid the most dan- 
gerous radiation belts that surround the 
gas giant — which over the lifetime of 
the mission could fry the spacecraft with 
the equivalent of more than 100 million 
dental X-rays — Juno will take a long 
elliptical dive around the planet on every 
orbit. The spacecraft will fly directly over 
Jupiter's magnetically intense auroras, and 
could spot unusual circulation patterns that 
resemble a hexagon-shaped feature parked 
on Saturn’s north pole. 

The lessons that scientists learn from 
Jupiter will apply to other gas giants, includ- 
ing those outside the Solar System. “If we 
understand how it formed, we'll have a 
much better handle on giant-planet influ- 
ences in planetary systems around other 
stars,’ Fletcher says. 

Juno will provide scientists’ last chance to 
look at Jupiter for a long time. It is sched- 
uled to make 37 total orbits before perform- 
ing a kamikaze run in early 2018, burning 
up inside the planet's clouds to keep it from 
contaminating the moon Europa. The only 
other mission planned to the gas giant is 
the European Space Agency’s Jupiter Icy 
Moons Explorer (JUICE) spacecraft, which 
could launch as early as 2022 and will focus 
mainly on the moon Ganymede. m 


Population growth and agriculture have stressed the Indus, which flows the length of Pakistan. 


CLIMATE CHANGE 


Indus River 
waters shrinkin 


Cooler, cloudier summers slow snowmelt in Himalayas. 


BY JANE Qiu 


r | he Indus River, which supports the lives 
of 300 million people, is supplying Paki- 
stan with less water than it did 50 years 

ago, particularly in the spring and summer, 

researchers have found. The news comes as 
demand for water is projected to rise sharply. 
The findings contradict previous predictions 
that the river’s volume would stay the same, or 
even grow, as climate change kicks in, although 
that increase is likely to occur in the next 
several decades, another team has found. 
Danial Hashmi, a hydrologist at the Paki- 
stan Water and Power Development Author- 
ity in Lahore, reported the river’s shrinkage 
for the first time in February at a conference in 

Kathmandu. Further data from India have also 

shown seasonal shifts. “The Indus is certainly 


changing, and local communities are feeling 
the pinch,” Shresth Tayal, a glaciologist at the 
Energy and Resources Institute in New Delhi, 
told a meeting in Columbus, Ohio, last month. 

The Indus flows through India, Afghanistan 
and China before reaching Pakistan, which it 
crosses from north to south. For decades, popu- 
lation growth and agriculture have stressed the 
river, which, for 10 months of the year, dries up 
before it reaches the sea. Because demand is set 
to rise by 30% by 2025, “water shortage will be 
the single most destabilizing factor, not only for 
Pakistan but the entire region’, says Arif Anwar, 
principal researcher at the International Water 
Management Institute in Lahore. 

But since the 2009 ‘glaciergate’ scandal — in 
which it emerged that the Intergovernmen- 
tal Panel on Climate Change had mistak- 
enly included in its fourth assessment report 
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a prediction that the Himalayan glaciers 
would disappear by 2035 — there has been a 
widespread belief that water resources in the 
region are stable, at least for now. Research 
by several groups even suggested that climate 
change might provide some relief in the short 
or medium term, thanks to faster melting of 
the glaciers that supply the river, and increased 
precipitation. 

Hashmi’ data, which are unpublished, come 
from a network of hydrological stations in Paki- 
stan that span the main stem of the Indus and 
three ofits tributaries. They show that the total 
water supply fell by 5% between 1962, when the 
hydrological stations were built, and 2014. 

“A reduction of 5% over five decades may not 
seem a lot,’ says Walter Immerzeel, a hydrolo- 
gist at Utrecht University in the Netherlands, 
who led one of the studies that projected an 
increase in water supply in the Indus (A. FE. Lutz 
et al. Nature Clim. Change 4, 587-592; 2014). 
“But if the trend persists, there could be devas- 
tating implications for water resources.” 

Hashmi’s team finds that the river’s 
shrinkage is seasonal, with a decrease in 
flows between April and August that exceeds 
a slight increase during the rest of the year. 
And it reports a temperature drop across the 
four Pakistani river basins in the summer 
months — even though the region is getting 
warmer overall. Because snow- and glacier 


melt contribute to 50-85% of river flow in 
those catchments, the team suspects that cooler 
springs and summers result in less melt and 
that this can explain the shrinking river. 
“It's a fascinating finding,’ says Tobias Bolch, 
a glaciologist at the University of Zurich in 
Switzerland. He notes that it is consistent with a 
phenomenon known as the Karakoram anom- 
aly, in which some of the glaciers in the region 
have become stable or 


“If the trend even grown — in con- 
persists, trast to most moun- 
there could be tain glaciers globally, 
devastating which are retreating 
implications rapidly in response to 
for water climate change. 
resources.” Another study 
presented at the Feb- 


ruary meeting suggested a possible reason for 
the region’s cooler summers. As the overall cli- 
mate warms, monsoons increasingly invade the 
mountain chains of the Indus upstream, where 
glaciers reside, says study co-author Hayley 
Fowler, a climate modeller at Newcastle Uni- 
versity, UK. Her modelling work shows that 
when monsoons penetrate into the region and 
push dry westerly winds northward, summer 
temperatures drop. The team suspects that 
monsoonal clouds hovering over a region that 
is normally hot and dry in the summer may 
have a cooling effect. 


IN FOCUS | NEWS 


The limitations of climate models and the 
scarcity of field measurements in the region 
make it hard to predict how Himalayan water 
resources will change, says Immerzeel. How- 
ever, the latest work by him and his collabo- 
rators — which took the Pakistani data into 
account — finds that things will get much 
worse, but only in the long term. Using state- 
of-the-art climate models, and assuming a 
scenario in which global greenhouse-gas emis- 
sions peak around 2040, the team found that 
the flow of water in the river system will stabi- 
lize or even increase in the next few decades — 
consistent with its previous results. But once 
glaciers have become depleted and regional 
temperatures have started to rise, water scarcity 
will ensue: the researchers predict a 15% 
drop between 2071 and 2100 compared with 
1971-2000 levels, Immerzeel says. The team 
has submitted a paper for review. 

Inany case, there is a pressing need for Pakistan 
to boost its water-storage capacity and efficiency 
of water usage, says Mobin-ud-Din Ahmad, a 
hydrologist at the Commonwealth Scientific 
and Industrial Research Organisation in Can- 
berra, Australia. Right now, its reservoirs can 
hold only 30 days’ worth of the country’s water 
needs — compared with 800 days in Australia 
and 150 days in India. “It’s an extremely danger- 
ous situation, especially now, when severe 
droughts are increasingly common,’ he says. m 
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Preprint website plans revamp 


But users are wary of major changes to ar Xiv repository. 


BY RICHARD VAN NOORDEN 


multimillion-dollar funding drive is 
A= readied to transform arXiv, the 

vastly popular repository to which 
physicists, computer scientists and math- 
ematicians flock to share their research pre- 
prints openly. 

But the results of an enormous user survey 
published this week suggest that researchers 
are wary of drastic changes to a site that has 
become an essential part of the infrastructure 
of modern science. 

Last year, the site served up around 
139 million downloads, and it now holds more 
than 1.1 million free papers. But it is being sus- 
tained by fragile code, donations from librar- 
ies and a charitable foundation and the good 
will of about 150 or so volunteer moderators, 
says the site's programme director, Oya Rieger. 
With its 25th anniversary approaching in 
August, arXiv’s advisory teams of scientists and 
librarians are considering a plan that involves 
raising US$2.5 million to $3 million to mod- 
ernize the platform. That will sit on top of its 
$1-million annual budget for staff and servers. 

To attract support from donors, arXiv’s 
operator, Cornell University Library in Ithaca, 
New York, is hoping to come up with a “com- 
pelling vision’, Rieger says. 

Scientists seem to love arXiv: 95% of the 
survey's 36,000 respondents said that they 
were very satisfied or satisfied with it. And 
most want to keep it just the way it is, although 
perhaps with some modernization. They were 
enthusiastic about the possibility of tweaks to 


WHAT DO ARXIV USERS WANT? 


improve the site's search functions, and about 
allowing references to be hyperlinked directly 
to research papers, for example (see ‘What do 
arXiv users want?’). Some wanted the site 
to broaden into new subject areas, such as 
chemistry — although such expansion would 
require the recruitment of scientists who are 
willing to moderate the manuscripts, notes 
David Morrison, chair of arXiv’s scientific 
advisory board. 


SOCIAL FORUM 

When asked whether arXiv should embark on 
more transformational changes, respondents 
gave mixed answers. In particular, some ques- 
tions focused on whether it should develop 
into a social forum that allows scientists to 
comment on papers 


or leave ratings. A “The message 
few social-media wasmore or less 
sites have already ‘stay focused 
been built around onthe basic 

the repository for dissemination 
just such purposes task’.” 


— such as SciRate 
and Arxiv Sanity Preserver — and some argue 
that the site itself should begin to incorporate 
such functionalities. “ArXiv should be more 
dynamic — allowing readers to filter the wheat 
from the chaff? says Alan Aspuru-Guzik, a 
quantum chemist at Harvard University in 
Cambridge, Massachusetts. But one-third of 
respondents said that this wasn’t important 
or that arXiv shouldn't be doing it. Only 34% 
voted in favour of such changes. 

That response points to a tension between 


The preprint repository got high marks overall from 36,000 respondents to a survey, but there was no 
consensus over whether the site should add social-media functionalities. 
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researchers who want to see the site incorpo- 
rate aspects of open review, and those who 
want it to stick to its core mission of allow- 
ing rapid exchange of scholarly papers, says 
Rieger. There were hints of a generational 
divide, with those aged under 30 more in 
favour of allowing comments. But even those 
who wanted a more social site said that they 
were keen to avoid a commenting free-for-all, 
Rieger adds. 

“The message was more or less ‘stay focused 
on the basic dissemination task, and don't get 
distracted by getting overextended or going 
commercial,” says Paul Ginsparg, a physicist 
at Cornell University who launched arXiv in 
1991 as a pre-World-Wide-Web-era bulletin 
board. 


CHECKS AND BALANCES 

Ginsparg notes, however, that arXiv’s users 
sometimes don’t know what they want until 
they get it. Researchers said that they liked the 
quality control now built into the site, includ- 
ing checks of papers for text overlap with 
other reports (potential plagiarism), classify- 
ing papers into the correct subject areas and 
rejecting work that has little scientific value. 
“These are for the most part things that users 
never actually requested,’ Ginsparg says. 
In the past 5 or so years, he has introduced 
automated machine-learning code that filters 
through the more than 9,000 papers submit- 
ted each month and flags up potential issues 
to human moderators. 

In September, arXiv’s advisory boards will 
meet to draw up a road map for progress and 
to discuss how to get the funds needed to 
modernize the site. The site is currently sus- 
tained by member institutions (mainly librar- 
ies, but also some research funding agencies) 
and by the Simons Foundation in New York. 
But some discussions have been held with 
other potential contributors such as the US 
National Science Foundation. It is also possi- 
ble that publishers or scientific societies could 
be asked to contribute, says Rieger. 

She adds that the site will need to be careful 
to remain objective. “We want to make sure 
that arXiv continues to be a neutral, trusted 
service,’ she says. m 


CORRECTION 

The science-workforce graph in ‘China by 
the Numbers’ (Nature 534, 452-453; 2016) 
erred in stating that China’s population is 
1.3 trillion. It is more than 1.3 billion. 


SOURCE: ARXIV 
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Dolly was born 
20 years ago. 


BY EWEN CALLAWAY 


JEFF J. MITCHELL/REUTERS 


The story of the world’s 
most famous sheep, 
from the people who 
brought her to life. 


THE CAST: Karen Walker, embryologist, PPL Therapeutics, Roslin, UK, now director, KXRegulatory, Linlithgow, UK; 
Bill Ritchie, embryologist, Roslin Institute, now at Roslin Embryology; Angela Scott, cell-culture technician, PPL, now 
chief operating officer, TC BioPharm, Motherwell, UK; Alan Colman, research director, PPL, now at Harvard University, 
Cambridge, Massachusetts; lan Wilmut, embryologist, Roslin, now University of Edinburgh, UK; John Bracken, farm 
research assistant, Roslin, now retired; Angelika Schnieke, molecular biologist, PPL, now Technical University of Munich, 
Germany; Harry Griffin, scientific director, Roslin, now retired; Jim McWhir, stem-cell scientist, Roslin, now retired. 


olly, the first mammal 

cloned from an adult cell, 

was born 5 July 1996. 

But she was created five 

months earlier, in a small 

room at the Roslin Insti- 
tute, outside Edinburgh, UK. 


Karen Walker: On the day we made Dolly, we 
had such a rubbish day. 


Bill Ritchie: It was 8 February 1996. I looked it 
up. We do know it was a rubbish day: we had 
various problems with infections and things. 


Walker: It’s a shame the building has been 
demolished, otherwise you could see the room 
in which Dolly was made. I use the word ‘room 
loosely, because it really was just a big cup- 
board, which, when Bill and I were in there, you 
could just get two chairs and the incubator in. 


Ritchie: It literally was the cupboard. It was the 
storage cupboard at the end of the lab. When 
we got camera crews in later, they couldn't 
believe it, there was no room to shoot. 


Walker and Ritchie were part of a project at the 
Roslin Institute and spin-off PPL Therapeu- 
tics, aiming to make precise genetic changes 
to farm animals. The scientific team, led by 
Roslin embryologist lan Wilmut, reasoned that 
the best way to make these changes would be 
to tweak the genome of a cell in culture and 
then transfer the nucleus to a new cell. 


Ritchie: The simple way of describing nuclear 
transfer is that you take an oocyte, an unferti- 
lized egg, and you remove the chromosomes. 
You then take a complete cell which contains 
both male and female chromosomes — all of 
our cells do, apart from the gonads. You take 
that cell and fuse it to the enucleated egg, acti- 
vate it — which starts it growing — and transfer 
it to a surrogate mother. Hopefully, with your 
fingers crossed, you will get a cloned offspring, 
a copy of the animal you've taken that cell from. 


Walker: Tedious is absolutely the word. You're 
sitting, looking down a microscope and you've 
got both hands on the micromanipulators. It's 
kind of like the joysticks kids use nowadays on 
games. If your elbow slipped, you could wipe 
the whole dish out. 


A year earlier, the team had produced twin 
sheep, named Megan and Morag, by cloning 
cultured embryonic cells in an effort spear- 
headed by Roslin developmental biologist 
Keith Campbell. But on this day in February 
1996, problems with the fetal cell lines they 
had planned to use meant that they would 
need another nuclear donor. 


Walker: My memory is of flapping like a 
chicken, thinking, “What are we going to put 
in?’ because the cells we were going to use 
aren't there. The last thing you want to do is 
waste those oocytes you've got. We wanted to 
try something, at least. 


Angela Scott: I received word from Karen to 
say that the cells they were expecting had been 
contaminated. They asked me ifI had any cells 
that they could use. The cells I had were ovine 
mammary epithelial cells: we were looking to 
increase expression of proteins in milk. These 
were adult cells. 


Alan Colman: | had come from a background 
of nuclear transfer with John Gurdon [a devel- 
opmental biologist at the University of Cam- 
bridge, UK]. Hed never been able to get an 
adult frog by using nuclear transfer from an 
adult cell donor. Hed been able to get tadpoles 
using adult cells, but hed never been able to get 
an adult frog. I didn’t think it would work with 
adult cells at all. But we had no other cell line 
to go with, so we all agreed that wed use these 

mammary-gland cells 


> NATURE.COM and just see what hap- 
To watch an pened, gain some expe- 
interview with the rience. These were from 


Dolly team, visit. 
go.nature.com/28ouboa 


a 6-year-old sheep — 
middle-aged for a sheep. 
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Ian Wilmut: This is something that is got wrong 
to this day. Dolly is described as the first mam- 
mal cloned from an adult cell. She’s actually the 
first adult clone, period. She’s often undersold. 


Although cloned and transgenic cows would 
be more valuable for industry, the Roslin team 
worked with sheep for practical reasons. 


Wilmut: Cattle are incredibly expensive and 
have a long generation interval. Sheep are 
much less expensive and much easier to work 
with. And we knew the reproductive biology. 
It was very likely that if we could make some- 
thing work in sheep, it would work in cows. 
Sheep are small, cheap cows. 


John Bracken: There would be 40-60 animals 
going through surgery [to retrieve oocytes or 
implant embryos in surrogates] each week 
during the breeding season. It’s a lot of differ- 
ent sheep in the system, and that had to be very 
accurately monitored so the animals were at the 
right place at the right time. 


Walker: Bill used to keep the embryos and 
oocytes — when he was bringing them back up 
from the farm — in his top shirt pocket. I didn't 
have a top shirt pocket, so I used to tuck them 
inside my bra. It was a way to keep them warm 
and fetch them back into the lab and get them 
into a proper controlled environment. I don't 
think inside my bra was terribly controlled, but 
neither was Bill’s top shirt pocket. 


Ritchie: On the day we made Dolly, I would 
have done the enucleation, and she would have 
done the fusion. That was our normal way of 
doing things. 


Walker: I did the fusion on the day we made 
Dolly. Bill and I joke, that he’s the mum and I'm 
the dad because, essentially, I was the mimic to 
what the sperm would do. 


They transferred 277 nuclei from the 
mammary cell line — from a white-faced 
breed known as a Finn Dorset — into eggs 
from the hardy Scottish blackface breed. Just 
29 of the resulting embryos were implanted 
into surrogate ewes. Expectations were low: 
it seemed almost impossible that an adult 
cell nucleus could be reprogrammed to give 
rise to a live animal. Most cloned embryos 
aborted, many even before a pregnancy could 
be determined with ultrasound. 


Wilmut: The sheep breeding season begins 
in October and ends in February, March-ish. 
By Christmas, we had established pregnan- 
cies after transfer from fetal cells, so that was 
going well. If we hadn't done that, we probably 
wouldn't have gambled on working with what 
became Dolly, the mammary cells. 


Angelika Schnieke: I remember meeting 
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Ian Wilmut in the canteen, and he was very 
sceptical. He said: “I would be surprised if it 
works, but PPL is paying for the experi- 
ments, so we're doing them.” 


Bracken: We scanned all the recipients 
that had embryos transferred, and we 
knew they were important sheep. 
Every day that the scientists knew 
we were scanning, they would be 
very keen to know if there were any 
pregnancies. 


Walker: I didn't go down to watch all 
the scans. But with Dolly — because 
we knew that those were cells Bill and 
Thad put in — I had gone down on that 
particular day with John. 


Bracken: | was just really pleased that it was 
a pregnancy. I didn't realize the real importance 
of it because we weren't really told. We just knew 
it was an important pregnancy. It didn’t carry 
the same weight. We weren't thinking, “Wow! 
If this progresses to a live lamb, this is going to 
bea world beater, or it’s going to turn scientific 
understanding on its head’ 


Walker: Id taken a blank video up with me, so 
that I could show my colleagues. That video is 
sitting up in my loft, and to my shame, I have 
never yet transferred it onto DVD. I should. 


Schnieke: I remember the day when we had the 
first scan. We always asked. And then we saw 
the picture and the scans. Then you just have to 
hope that it lasts and goes all the way through. 


Wilmut: My memory is they were look- 
ing around day 30 or 35, so there's another 
120 days [until the birth], where you keep on 
sighing with relief and hoping. 


Just a few of the team members got to 
witness her birth. 


Bracken: It happened about 4:30 in the after- 
noon. As soon as she went into labour, we called 
the Dick Vet [the Royal School of Veterinary 
Studies in Edinburgh] to get one of their vets 
to come out. Even though [farm research assis- 
tant] Douglas McGavin and myself probably 
had 50 years of experience between us, it just 
would have been unheard of if wed decided wed 
assist the birth and something had gone wrong. 


Ritchie: We knew Dolly was about to be born, 
and I think she was showing signs of get- 
ting near lambing, and lo and behold I went 
through and there were bits of Dolly being 
born. There was a vet there, so she made sure 
the animal was okay and pulled the lamb out. 


Bracken: It was absolutely normal. No com- 
plications whatsoever. She was a very viable 


lan Wilmut with Dolly on 
display at the National 
Museum of Scotland. 


lamb. She got on her feet very quickly, probably 
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“With 
hindsight, 
without 
a doubt it 
was a great 

name. 


within the first half hour, which isa really good 
indication that things are normal. 


Ritchie: | think I was jumping up and down 
when I saw that white face. 


Scott: Karen was away at a wedding at the time. 


Walker: I had given her the fax number of the 
hotel. I wish I had kept that fax. It said: “She has 
a white face and furry legs.” 


Scott: I don't know what they must have thought 
at the hotel: “Wow, that’s a really unusual baby” 


Wilmut: I was in the allotment. I had a phone 
call to say we hada live lamb. I issued an instruc- 
tion that nobody should be there who didn't 
have to be there. Lots were curious. I obeyed my 
own rule because I'd got nothing to contribute. 


Bracken: I’m standing next to Douglas 
McGavin watching the vet assist this birth, and 
I made an off-the-cuff remark to Douglas. I 
said, “You know what we're going to have to 
call this lamb? We’re going to have to call it 
Dolly’, after Dolly Parton, because the cells 
are derived from mammary tissue. 


Wilmut: Being somewhat puritanical, I might 
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have been a bit worried. With hindsight, 
without a doubt it was a great name. 


Bracken: This is hearsay. I never got told 

this directly. But I heard they had con- 

tacted Dolly Parton and said: “We've 

got this cloned sheep that’s named 
after you.” 


Wilmut: I don’t know how the mes- 
sage came through, but we were 
told her agent had said: “There was 
no such thing as baaad publicity.” I 
don't know if that’s true. 


Over the next few months, Wilmut’s 
team confirmed that Dolly was a 
clone of the mammary cell line, and 

wrote up the results. Her birth was to 
be kept top secret, until the Nature paper 
describing the experiment could be pub- 
lished in February 1997 (1. Wilmut et al. 
Nature 385, 810-813; 1997). 


Harry Griffin: Two or three months before the 
publication of the paper, I got to know about 
it. In terms of preparation, PPL were involved. 
They saw it as an opportunity to get publicity for 
themselves. We worked with their PR company, 
De Facto. We did quite a bit of preparation. 


Wilmut: Ron James, who was the chief 
executive of PPL therapeutics, and I were cited 
as the primary spokesmen and given a bit of 
training by ex-BBC people, who first of all 
came up and fairly aggressively stuck micro- 
phones up our noses and asked aggressive 
questions, and subsequently did it very gently. 
We weren't approached in anywhere near the 
aggressive way they tried first, which was quite 
shocking. I’m sure it was worth having. 


Griffin: We had everything organized. The 
calls would be directed to De Facto and they 
would try and organize some coherence in our 
response in terms of who got priority and who 
didn't. All this would culminate, we hoped, on 
the Thursday that the paper came out. What 
was that, 27 February? Clearly, it didn't. 


Wilmut: Robin McKie at The Observer leaked 
it. He will deny the charge. 


Robin McKie, science and technology edi- 
tor, The Observer, London: I didn’t see that 
stuff in Nature. I don't blame him for being 
angry, but I went to great pains to avoid the 
things that would get me to be accused of that. 
Ihad helped a couple of guys who were mak- 
inga TV programme about genetics, and they 
said, “Oh, by the way, they've cloned a sheep 
in Edinburgh.” I didn’t believe them, but I 
phoned a few people in the field, and one of 
them in America confirmed it. But I was very, 
very worried. I was saying something quite 
sensational, with absolutely no paper proof of 
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anything that had gone on. I told my deputy 
editor everything I knew, and he made me 
write it. Then the shit hit the fan. 


Griffin: Ian gave me a call and said hed just 
been called up and told that The Observer was 
“ going to run the story on the Sunday prior to 
publication in Nature. 

Ian and I went into the institute at about 
9 a.m. on the Sunday, not knowing whether or 
not people could get through. The phone rang 
continuously. We had a bizarre circumstance 
where a phone started ringing in a cleaning 
cupboard. When I answered it, it was, I think, 
the Daily Mirror, who had somehow got this 
particular connection. About half past nine at 
night, we went home. 


FF J. MITCHELL/REUTERS 


Jim MeWhir: 1 remember coming in on the day 
after the embargo broke and there were several 
satellite vans in the carpark. 


Wilmut: There were television trucks every- 
where. I went and spoke on Good Morning 
America. 


Griffin: CBS, NBC, ABC, BBC, all there want- 
ing interviews with Ian, wanting to see the 
sheep. It was chaos. I don’t think you can ever 
appreciate the intensity of the media in full 
flight unless you've experienced it yourself. 


MeWhir: It was just pandemonium. Going 
down to the large-animal unit, it was just a 
forest of flash bulbs and reporters. It was quite 
amazing. I just turned around and went back 
to work. 


Griffin: My secretary would put the phone 
down, and it was ringing immediately. 


hers 
“Sy 


mm it was just 


a forest 
of flash 


bulbs and 
reporters.” 


One of the names I heard being mentioned 
was Harold Shapiro [then chair of the US 
National Bioethics Commission]. She said, 
“Tan Wilmut can’t talk to you now, can you 
call back later?” Bill Clinton had asked him 
to report back within 90 days on the ethi- 
cal implications of cloning. I overheard his 
name, and said, “No, we definitely want to 
talk to him.” 


Colman: When you're embedded in a project, 
you have what you consider to be good scien- 
tific reasons for doing it. Everything we did 
was covered by an ethics committee. We had 
been through a lot of concerns about animal 
health. Our concern was more about that kind 
of reaction. We weren't doing it as a prelude to 
cloning humans. 


Griffin: People in the media pressed this point 
repeatedly. We were accused of keeping Dolly’s 
birth secret because we were contemplating 
cloning a human. We had our position clear 
on that: it was unethical and unsafe. 


Wilmut: It goes with the job. You just have to 
explain this is not the case. 


Schnieke: In Europe, it was immediately seen 
as a negative. “What have they done now and 
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what could they do next?” We had police at the 
institute who explained what you do if there's 
a bomb scare. Packages were being screened 
for explosives. 


Walker: I do remember Jan Wilmut’s personal 
assistant, Jackie, getting phone calls after it all 
hit the press. She had lots of phone calls, some 
of them were a bit crackpot, from people want- 
ing their dogs cloned. The sadder ones were 
those people who had lost children or who had 
illnesses themselves, and this was going to bea 
breakthrough that could cure different diseases. 


Colman: Dolly seemed to capture the imagina- 
tion. It was a furry animal. Having a name that 
was identifiable helped enormously. 


Bracken: If shed been seen as being an animal 
that was locked away, that not many people 
saw, that could have perpetuated more bad 
publicity. But I think, because of the open- 
ness, that people were allowed to go and visit 
her and be shown around, this did help in the 
acceptance of the public. 


Griffin: She performed well for camera, and 
everybody could see she was a perfectly nor- 
mal animal. Because she was accessible and 
photogenic, she became the most famous 
sheep in the world. Any marketing manager 
would have killed for it. In some of the pictures 
it’s as if she’s interviewing the media. 


Walker: I took a photographer down to 
see Dolly. This guy produced a kid’s party 
crown, a little gold thing. I said: “I don’t 
think we should.” We were all very keen not 
to allow Dolly to become humanized. She 
was a sheep and that was it. 
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Bracken: Away from the media and the 
cameras, we tried to treat her just like the other 
sheep, not as a sort of celebrity, which she 
obviously became. 


Walker: The first time she was shorn, they took 
the wool — which I have some of, actually — 
to be knitted into a jumper for a cystic-fibrosis 
charity. Have you seen her in the museum? 
She's behind a glass case now because people 
kept pinching bits of wool from her. At least I 
got my wool while she was still alive. 


Dolly lived for six and a half years and 
gave birth to several lambs herself. But 
in 2003, she began to show signs of 
illness. 


Bracken: It was Valentine's Day. I think 
it was a Friday. We knew that there was 
the potential for this lung disease to 
have developed. 


Griffin: She suffered from a disease 
called jaagsiekte. It’s a disease of the 
lungs and one or two other sheep before- 
hand had gone down with it. 


Wilmut: They thought she should be X-rayed 
over at the vet school. They were surprised 
at the size of the tumour in her lungs. We 
debated, under these circumstances, how 
hard we should struggle for her to recover. 
Wouldn’t it be kinder to just let her go? So we 
euthanized her. You are responsible for the 
welfare of the animals on your project. 


A decade later, another loss struck the scien- 
tific team with the death of Keith Campbell. 


Colman: Keith was the driving force. He was 
the person who did the important experi- 
mental work that sowed the seeds of the 
protocol we all used. Dolly would not have 
happened without Keith. 


Ritchie: Keith was, I suppose, ‘unusual’ is 
probably the thing you would say about him. 
He was quite hippy. He drove a Volkswagen 
Beetle, smoked roll-ups, had long hair. 


Colman: He didn’t have a great relationship 
with Ian. They were very different person- 
alities and often argued. 


Wilmut: I don’t remember rows. We would 
have had slightly different priorities 
sometimes. 

It's always very difficult to divide recogni- 
tion up. What was obviously the cause of some 
annoyance and some criticism is that he didn’t 
get the first authorship on the Dolly paper. He 
did get absolutely all the others. There was a 
time when he said the Megan and Morag paper 
was actually more important than Dolly. He 
definitely was frustrated that I got an FRS 
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(Fellow of the Royal Society) and ultimately 
aknighthood. 


After a domestic dispute, Campbell killed 
himself on 5 October 2012. 


Colman: Keith was a very good friend of mine 
and we used to go mountain biking in Scotland 


Bill Ritchie at the 
Roslin Institute. 


“Tt literally 
was the 
cupboard.” 


in the evenings after work. I spoke with him 
three days before he died. I was very shocked. 


Walker: That hit me very hard, harder than 
I would have imagined. I hadn't seen him in 
many, many years. We were such a close, tight 
group at the time. We had to be. 


Colman: | went to a meeting in Paris last 
January, where they had a posthumous award. 
They took a straw poll of how many people in 
the audience had been helped by what Keith had 
done, and a huge number of people put their 
hands up. 


The techniques developed in the creation of 
Dolly were used to copy valuable livestock 
and make transgenic animals. But in biomedi- 
cal labs, Dolly hinted at a future in which cells 
could be reprogrammed to an embryo-like 
state and used to treat human diseases. 


Wilmut: The birth of Dolly turned the rules of 
development upside down, and made a lot of 


biologists think differently. 


Jeanne Loring, stem-cell biologist, the 
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Scripps Research Institute, La Jolla, Cali- 

fornia: That was the onset of cattle cloning, 
which is actually quite popular now. There's 
a tremendous value in being able to improve 
cattle, and this gave people another tool. 


George Seidel, animalreproductive biologist, 
Colorado State University, Fort Collins: There 
are cloned bulls producing semen that’s being 
sold. There's an Angus bull called Final Answer, 
he’s got half'a million offspring or something 
like that. So his clone is called Final Answer II, 
and you can buy his semen at half the price. My 
wife and I havea cattle ranch, so we use Final 
Answer II. Hell, it’s the same genetics. But 
from a theoretical standpoint, the trans- 
genic stuff is really much more impor- 
tant than just making copies. To make 
our first transgenic cow, we created 
thousands of embryos. It was a huge 
effort. A tenth of the money, a tenth 
the animals is what transgenics plus 
cloning could do for you. 


Robert Lanza, chief scientific officer, 
Astellas Institute for Regenerative 
Medicine, Marlborough, Massachusetts: 
I was excited. Now we could hopefully apply 
the same technique — not so much for animals 
and agriculture — but for treating along list of 
human diseases. What Dolly showed was the 
enormous power of that technology and the 
magic of the egg. There were factors in the egg 
that could take adult cells backwards in time 
and restore them to an embryonic state. 


Shinya Yamanaka, stem-cell scientist, Kyoto 
University, Japan: My initial response was 
“Wow! It’s like science fiction.” But it was not 
something I was planning to work on. Judg- 
ing from the paper, the cloning process is very 
technically challenging. The next year, the first 
human embryonic-stem-cell paper came out. 
That's when I re-evaluated Dolly. I thought, at 
least in theory, we should be able to reprogram 
somatic cells back into the embryonic state so 
we can make ES-like stem cells directly from 
skin or blood cells. 


MeWhir: A result like Dolly stops people in 
their tracks, and they say: “Well hang on. If 
Td have said that is impossible, what else am I 
saying is impossible?” 


Schnieke: You have some experiments where 
it brings up your heartbeat. Dolly was one. 


Ritchie: It's kind of like having children. I 
haven't got any myself. Maybe Dolly’s that 
sort of child. 


Wilmut: It would be wrong to say my name's 
known all the way around the world — but 
Dolly’s is. m 


Ewen Callaway writes for Nature from London. 
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MYSTERY IN THE HEAVENS 


Ultra-powerful signals known as fast radio bursts are bombarding Earth. 
But where are they coming from? 


o astronomer had ever seen anything 
Ni it. No theorist had predicted it. Yet 

there it was — a 5-millisecond radio 
burst that had arrived on 24 August 2001 from 
an unknown source seemingly billions of light 
years away. 

“Tt was so bright, we couldn't just dismiss it,” 
says Duncan Lorimer, who co-discovered the 
signal’ in 2007 while working on archived data 
from the Parkes radio telescope in New South 
Wales, Australia. “But we didn't really know 
what to do with it” 

Such fleeting radio bursts usually came 
from pulsars — furiously rotating neutron 
stars whose radiation sweeps by Earth with the 
regularity of a lighthouse beam. But Lorimer, 
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an astrophysicist at West Virginia University in 
Morgantown, saw this object erupt only once, 
and with more power than any known pulsar. 
He began to realize the significance of the 
discovery’ only after carefully going over the 
data with his former adviser, Matthew Bailes, 
an astrophysicist at Swinburne University of 
Technology in Melbourne, Australia. If the 
source was really as far away as it seemed, it 
had released the energy of 500 million Suns in 
just a few milliseconds. “We became convinced 
it was something quite remarkable,” he says. 
But when no more bursts appeared, initial 
excitement turned to doubt. Radio astrono- 
mers have learnt to be sceptical of mysterious 
spikes in their detectors: the events can all too 


easily result from mobile-phone signals, stray 
radar probes, strange weather phenomena 
and instrumental glitches. Wider acceptance 
of what is now known as the Lorimer burst 
came only in the past few years, after observers 
working at Parkes and other telescopes spotted 
similar signals. Today, the 2001 event is rec- 
ognized as the first in a new and exceedingly 
peculiar class of sources known as fast radio 
bursts (FRBs) — one of the most perplexing 
mysteries in astronomy. 

Whatever these objects are, recent observa- 
tions suggest that they are common, with one 
flashing in the sky as often as every 10 sec- 
onds’. Yet they still defy explanation. Theorists 
have proposed sources such as evaporating 


WAYNE ENGLAND 


The Parkes telescope 
in Australia detected 
the first fast radio 
burst in 2001. 


black holes, colliding 
neutron stars and 
enormous magnetic 
eruptions. But even 
the best model fails 
to account for all the observations, says Edo 
Berger, an astronomer at Harvard University in 
Cambridge, Massachusetts, who describes the 
situation as “a lot of swirling confusion” 
Clarity may come soon, however. Telescopes 
around the world are being adapted to look for 
the mysterious bursts. One of them, the Cana- 
dian Hydrogen Intensity Mapping Experiment 
(CHIME) near Penticton in British Columbia, 
should see as many as a dozen FRBs per day 
when it comes online by the end of 2017. 
“This area is set to explode,’ says Bailes. 


CURIOUSER AND CURIOUSER 
Astronomers might have had more confi- 
dence in the Lorimer burst initially had it not 
been for a discovery in 2010 by Sarah Burke- 
Spolaor, who was then finishing her astrophys- 
ics PhD at Swinburne. Burke-Spolaor, now an 
astronomer at the US National Radio Astron- 
omy Observatory in Socorro, New Mexico, was 
trawling through old Parkes data in search of 
more bursts when she turned up 16 signals that 
shook everyone’s confidence in the original’. 
In most ways, these signals looked remark- 
ably similar to the Lorimer event. They, too, 
showed ‘dispersion, meaning that high- 
frequency waves appeared in the detectors 
a few hundred milliseconds before the low- 
frequency ones. This dispersion effect was 
the most important piece of evidence con- 
vincing Lorimer and Bailes that the original 
burst came from well beyond our Galaxy. 
Interstellar electrons in clouds of ionized 
gas are known to interact more with low- 
frequency waves than with high-frequency 
ones, which delays the low-frequency waves’ 
arrival at Earth ever so slightly, and stretches 
the signal (see ‘Flight delays’). The delay in the 
Lorimer burst was so extensive that the wave 
had to have travelled through a lot of matter — 
much more than is in our Galaxy. 
Unfortunately for Lorimer and Bailes’ peace 
of mind, Burke-Spolaor’s signals also showed 
a crucial difference from the original: they 
seemed to pour in from everywhere, not just 
from where the telescope was pointing. Dubbed 
perytons, after a mythical winged creature that 
casts a human shadow, these bursts could have 
been caused by lightning, or some human-made 
source. But they were not extraterrestrial. 
Lorimer decided to postpone his research 
into FRBs for a while. “I didn’t yet have ten- 
ure,’ he says, “so I had to go back and do more 
mainstream projects, just to keep my research 
moving.” Bailes and his team kept going, and 
upgraded the Parkes detector’s time and fre- 
quency resolution. In 2013, they turned up four 
new FRB candidates that resembled the Lorimer 
burst*. But some outsiders remained scepti- 
cal that the signals were really coming from 


space — not least because all the FRBs thus far 
had been seen by one team using one telescope. 
“I was desperate for someone else to find them 
somewhere else,’ says Bailes. 

In 2014, his wish was finally granted. A 
team led by astronomer Laura Spitler at the 
Max Planck Institute for Radio Astronomy 
in Bonn, Germany, published their observa- 
tions of a burst at the Arecibo Observatory in 
Puerto Rico”. “I was ridiculously overjoyed,’ 
says Bailes. 

The Arecibo discovery convinced most 
people that FRBs were the real deal, says Emily 
Petroff, who is now an astrophysicist at the 
Netherlands Institute for Radio Astronomy in 
Dwingeloo. Yet, as long as the Burke-Spolaor 
signals went unexplained, they cast a shadow 
of doubt. “At any talks I would give,” says 
Petroff, “someone would say, ‘But what about 


perytons?’” So in 2015, while still a graduate 
student at Swinburne, she led a hunt to track 
down the source of perytons once and for all. 

First, Petroff and her team used the upgraded 
Parkes detector to pinpoint when the bursts 
were happening: at lunchtime. “Immediately I 
thought, “This isn’t weather,” says Petroff. Then 
came another peryton at a suspiciously famil- 
iar radio frequency, which led the team to run 
experiments in the staff kitchen. Perytons, they 
discovered, were the result of scientists opening 
the microwave oven mid-flow. But the Lorimer 
event was in the clear: records showed that at 
the time of the burst, the telescope had been 
pointed in a direction that would have blocked 
any microwave signal from the kitchen’. 

“So then I worried, maybe they've just got 
a different brand of microwave at Arecibo, 
says Bailes, whose team at Parkes had, by then, 
racked up 14 separate bursts. He did not relax 
completely until later in 2015, when a burst was 
spotted at a third facility — the Green Bank Tel- 
escope in West Virginia. That burst had another 
quality that supported an extraterrestrial ori- 
gin: its waves were rotated in a spiral pattern 
— which results from passing through a mag- 
netic field — and were scattered as if they had 
emerged from a dense medium. “There's no way 
that’s a microwave oven,’ Bailes told himself. 


BURSTS OF INSPIRATION 

But that still leaves the question of what the 
FRBs actually are. The extreme brevity of the 
signal, just 5 milliseconds, implied that the 
source must be a compact object no more than a 
few hundred kilometres across — a stellar-mass 
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black hole, perhaps, or a neutron star, the com- 
pact core left over by a supernova. And the fact 
that Earth-based telescopes can detect the FRBs 
at all means that this compact source somehow 
puts out an immense amount of energy. But 
that still leaves a long list of candidates, from 
merging black holes to flares on magnetars: rare 
neutron stars with fields hundreds of millions of 
billions of times stronger than the Suns. 

An important clue arrived earlier this year 
when Spitler’s team reported that at least one 
FRB source repeats: data from Arecibo revealed 
a flurry of bursts over two months, some spaced 
just minutes apart’. That behaviour has been 
confirmed by the Green Bank telescope, which 
detects signals in a different frequency band’. 
Until then, each of the observed FRBs had been 
a one-off event, which hinted at cataclysmic 
explosions, or collisions in which the sources 
were destroyed. But a repeating FRB implies 
the existence of a source that survives the pulse 
event, says Petroff. And for that reason, she 
says, “I would assume it would be something 
to do with a neutron star” — one of the few 
known objects that can emit a pulse without 
self-destructing. 

Spitler agrees. As an example, she points 
to the Crab nebula: the result of a supernova 
explosion that was observed from Earth in 
1054 and left behind a rapidly spinning pulsar 
surrounded by glowing gas. The Crab pulsar 
occasionally releases extremely bright and nar- 
row radio flares, Spitler says. And if this nebula 
were in a distant galaxy and hugely boosted 
in energy, its emissions would look like FRBs. 

If one source repeats, Spitler says, the simplest 
interpretation is that they all do, but that other 
telescopes haventt been sensitive enough — or 
lucky enough — to see the additional signals. 
Yet others think that perhaps only some are 
repeating. “I wouldn't be surprised if we end 
up with two or three populations,” says Petroff. 


ALONG WAY HOME 

Another crucial question is how far away the 
FRBs are. The 20 bursts seen so far seem to be 
scattered randomly around the sky rather than 
being concentrated in the plane of the Galaxy, 
which suggests that their sources lie beyond 
the borders of the Milky Way. 

And yet to Avi Loeb, a physicist at Harvard 
University, such vast distances imply an implau- 
sibly large energy output. “If you want the burst 
to repeat, you wont be able to destroy the source 
— therefore, it cannot release too much energy,” 
he says. “That puts a limit on how far away you 
can put it?” Perhaps, he says, the FRB sources 
are neutron stars in our own Galaxy, and the 
dispersion is mostly the result of still unknown 
electron clouds that blanket them. 

But others suggest that such a dense cloud 
in the Galaxy should be visible in other wave- 
lengths. At the California Institute of Tech- 
nology (Caltech) in Pasadena, astrophysicist 
Shri Kulkarni has scoured data from several 
telescopes for a galactic source and turned up 
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nothing’. Kulkarni, who directs Caltech’s 
optical observatories, initially argued for 
galactic FRBs, and even made a US$1,000 
bet on it with astronomer Paul Groot of 
Radboud University Nijmegen in the 
Netherlands. Now, he finds the evidence 
for extragalactic FRBs to be overwhelm- 
ing, and has agreed to settle the bet — 
grudgingly. “I think I will pay him in $1 
bills,” he says. 

Still, Kulkarni hasn’t ruled out the 
possibility that the FRB sources lie in gal- 
axies that are perhaps a billion light years 
away, rather than many billions. Such a 
distance would still require at least some 
of the signal dispersion to come from 
electron clouds in the host galaxy, he 
says. But closer FRBs would not have to 
be so energetic. “It takes them from being 
amazingly exotic, to just exotic,’ he says. 

The answer could mean a great deal to 
observers. If the FRB signals have trav- 
elled through local plasma clouds, they 
could give weather reports from neigh- 
bouring galaxies. But if they are truly 
cosmological — coming from halfway 
across the visible Universe — they could 
solve a long-standing cosmic mystery. 

For decades, astronomers have known 
from observations of the early Universe 
that the cosmos should contain more 
everyday matter — the kind made up of 
electrons, protons and neutrons — than 
exists in the visible stars and galaxies. 
They suspect that it lies in the cold inter- 
galactic medium, where it is effectively 
invisible. But now, for the first time, 
the dispersion of the FRB signals could 
enable them to measure the medium’s 
density in any given direction. “Then, we 
have essentially a surgical device to do 
intergalactic tomography,’ says Kulkarni. 


RAPID-FIRE DETECTION 
First, however, astronomers have to find 
alot more FRBs and pin down their loca- 
tions. “Until then, we are stumbling in 
the dark,” says Berger. 

One way to accomplish that is to 
extract the FRBs from radio-telescope 
data in real time, so that scientists at other 
observatories can observe the bursts in 
multiple wavelengths. Since last year, the 
Parkes team has been doing this by boosting the 
observatory’s in-house computing power, and 
scientists at Arecibo hope to follow suit this year. 
In February, the strategy seemed to be paying off 
when an independent team followed up within 
two hours of an FRB’s detection at Parkes. The 
team tentatively pinpointed the burst to a spe- 
cific galaxy almost 6 billion light years away. 
Further observations cast doubt on that inter- 
pretation. But even so, says Lorimer, the method 
is sound and may pay offin the future. 

Others observers are putting their hopes in 
new telescopes. In 2014, astrophysicist Victoria 
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SOURCE 


An unknown event 
emits a huge burst 
of radio waves over 
a range of 
frequencies 
simultaneously. 


SIGNAL 


The signal is lost in 
the noise until the 
telescope’s output 
is separated into 
frequency bands. 
This reveals a 
cascade of peaks 
that corresponds 
to the dispersion of 
the burst. 


FLIGHT DELAYS 


Astronomers are not sure what causes fast radio bursts (FRBs). 
But as the waves reach Earth, low-frequency ones lag behind 
high-frequency ones. The extent of this delay suggests. that the 
signals have travelled through intergalactic space for potentially 
billions of light years. E 
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High Low. 
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Electron clouds 
between the 
galaxies interact 
with the waves, 
stretching and 
slowing the lower 
frequencies more 
strongly than the 
higher ones. 


Interstellar 
clouds 


A telescope on 
Earth measures 
the delay and 
stretch, enabling 
astronomers to 
estimate how far 
the signals have 
come. 
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Kaspi at McGill University in Montreal, Canada, 
submitted a proposal to adapt CHIME, which 
was originally designed to map the expansion 


next year, says Kaspi, ultimately finding 
more than a dozen per day. 

In Hoskinstown, Australia, meanwhile, 
Bailes and his colleagues are refurbishing 
the 1960s-vintage Molonglo Observatory 
Synthesis Telescope, turning it into an 
FRB observatory with a single half-pipe 
16 times longer than CHIME’, although 
one-quarter as wide. The team has already 
found three as-yet-unpublished FRBs 
with the facility working at only about 
20% of its final sensitivity, says Bailes. 

Another strategy for locating the FRB 
sources is to work with existing facilities 
such as the Very Large Array: an ‘inter- 
ferometer’ that uses the time difference 
between signals from 27 radio telescopes 
spaced across 36 kilometres of grassland 
near Socorro, New Mexico, to create a 
single, high-resolution image. Sometime 
in the next year or so, says Lorimer, the 
array could detect an FRB and locate its 
home galaxy. “Ultimately, that could set- 
tle a lot of arguments and bets,” he says. 

Kulkarni, meanwhile, is leading two 
projects. The first uses ten 5-metre- 
wide dishes in an array that can see and 
locate only super-bright FRBs, but that 
makes up for its low sensitivity by peer- 
ing at a huge swathe of sky. The second 
takes the principle to the extreme, using 
2 antennas spaced at observatories 
450 kilometres apart that will see only 
the very brightest FRBs, but that are able 
to examine half the sky at once. That 
would enable it to catch the rare FRBs 
that presumably exist within our own 
Galaxy, but whose extreme brightness 
existing telescopes are not designed to 
see. “Most facilities would just discount 
it as interference,” says Kulkarni. 

If FRBs do turn out to come from 
cosmological distances, says Loeb, their 
identification would be a major break- 
through, potentially unravelling a new 
class of source that could be used to probe 
the Universe's missing matter. But then, 
he says, FRBs could also be something 
that no one has thought of yet: “Nature 
is much more imaginative than we are.” m 


Elizabeth Gibney is a reporter for 
Nature in London. 
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Make climate-change 
assessments more relevant 


Stéphane Hallegatte, Katharine J. Mach and colleagues urge researchers to gear their 
studies, and the way they present their results, to the needs of policymakers. 


ith the ink just dry on the Paris 
climate agreement, policymakers 
want to know how they can act 


most effectively. Ambition is high: the long- 
term goal is to keep the average warming of 
the planet to well below 2°C, and even to 
1.5°C. Governments, corporations and com- 
munities have many options for minimizing 


dangerous climate change, and must choose 
between conflicting priorities and objectives. 
For example, how should governments 
decarbonize energy while increasing access 
to it without resorting to fossil fuels? 

No single approach will work for all. 
The risks and impacts of climate change 
differ by place and time. Local values and 
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contexts matter. Small islands are vulnerable 
to sea-level rise, for example, and fossil-fuel 
exporters will lose profits from the transi- 
tion to low-carbon energy. We must consider 
value judgements, such as the relative impor- 
tance of economic damage versus biodiversity 
loss, as well as inequality and fairness. 
And the relevant climate and social 
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>» sciences are themselves diverse, from 
studies of the physics of storm formation 
to investigations of the role of heritage in 
cultural identity. The challenge for those 
who assess such scientific knowledge, such 
as the Intergovernmental Panel on Climate 
Change (IPCC), is to summarize results in 
ways that are true to the original research, 
explicit about the values and judgements in 
the analysis, and digestible by and useful to 
policymakers and the public. 

For example, the IPCC’s 2014 Synthesis 
Report encapsulated factors from climate 
to ecology to technology into a single 
figure (Figure SPM.10)’. This illustrates 
how long-term global risks are linked to 
emissions-reduction requirements under 
different physical, policy and risk scenarios. 
Such a figure, although an achievement, 
can convey only a glimpse of the complex 
analysis that went into it. 

In the IPCC’s sixth cycle of assessment, 
the climate-science community needs to 
supply the right sorts of information to help 
decision-makers to construct policies from 
myriad mitigation and adaptation options. 
Producing this information will require 
more multidisciplinary research, updated 
strategies for communicating uncertainty 
and studies of a broader range of climate and 
risk projections that include the impacts of 
policy responses. 

Here, we set out four steps to putting 
policy relevance at the core of both research 
and assessment. 


Integrate disciplines from the start. The 
range of risks summarized in the IPCC’s 
2014 Synthesis Report was limited by the 
research available. For example, the assess- 
ment highlighted increasing risks of climate 
extremes but said little about how climatic 
hazards interact with societal vulnerabil- 
ity. Sparse information on how risks evolve 
at specific warming levels resulted in the 
reporting of broad, qualitative levels of 
risk — for example, ‘undetectable’ to ‘very 
high; as judged by experts. But comparison 
across risks was difficult. 

Climate scientists need to close these 
gaps by scrutinizing the feedbacks between 
development pathways, climate change 
and its impacts and risks, and policies and 
responses. The community has created 
socio-economic scenarios that are better able 
to combine climate-policy consequences and 
climate-change impacts in certain areas — 
such as how poverty reduction reduces 
vulnerability to extreme events — and to 
investigate their interplay with develop- 
ment trends ranging from population to 
land-use trajectories”. But covering many cli- 
matic and societal futures, globally to locally, 
is amonumental task. Projects that compare 
assumptions and results between different 
models are a start, but need to include more 
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evidence and expert judgements across 
disciplines. 

Research and assessments must be 
designed to solicit and answer questions 
crucial to decision-making. For example, 
how do risks and requirements compare for a 
climate goal at 1.5°C, 2°C or more? How 
can we avoid locking in to carbon-intensive 
development pathways and keep open 
options for rapid decarbonization? How 
can the effectiveness of adaptation actions 
be ensured? And how can emissions be 
reduced without slowing the pace of poverty 
reduction? 


Explore multiple dimensions. Risks from 
a changing climate and responses to it vary 
dramatically from place to place, through 
time and with different levels of adaptation 
and mitigation. Projections of increases in 
sea level for different emissions scenarios, 
for example, range from tens of centimetres 
to more than 10 metres over centuries to 
millennia’. Small islands might quickly face 
inundation whereas large countries would 
have more time to adapt. Past assessments 
focused on characterizing a few alternative 
futures (such as continued high emissions 
versus ambitious mitigation) rather than 
weighing up the risks and benefits of limiting 
warming across a ladder of possible targets: 
1.5°C, 2°C, 2.5 °C or higher. 

A broader census of differences through 
space and time would strengthen the infor- 
mation foundation for policymaking. 
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Decision-makers with different goals could 
select portfolios of responses, for example, 
based on risks to all, risks to the most vul- 
nerable, risks of economic damages, risks of 
irreversible changes or a combination. 

The distribution of losers and winners — 
regarding policies and impacts as well as 
people and places — needs to be studied. 
For example, the destruction of coral reefs 
affects fishing communities and may add 
to stresses, especially in places with weak 
governance. In some high-latitude areas, 
by contrast, a warming climate will bolster 
agricultural yields. Building sea walls could 
reduce coastal flood risks but threaten 
ecosystems, historical heritage and land- 
scape beauty. Risks and opportunities from 
investments in mitigation options need to be 
evaluated. For example, expanding biomass 
energy may reduce (or reverse) emissions 
but could also threaten food production 
and biodiversity. Renewable energy reduces 
emissions and provides electricity more 
cheaply than that from fossil fuels in many 
remote locations, where some of the poorest 
people live. 

More research is needed on regional 
challenges and opportunities that go beyond 
the use of a single metric — global mean 
warming — asa proxy for climate change and 
its impacts’. For example, ocean acidification 
and sea-level rise are not linearly related to 
peak temperature, and the risks that they cre- 
ate require more detailed investigation. And 
reducing emissions of short-lived climate 
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pollutants such as soot and tropospheric 
ozone precursors might not change peak 
warming, but would slow the rate of warm- 
ing globally’; this would allow more time for 
ecosystems and societies to adapt, as well as 
provide local health benefits. 


Consider uncertainty. Decision-makers 
need to appreciate a wide range of possi- 
ble outcomes, including uncertainty in the 
consequences of global climate policies. Four 
aspects of uncertainty must be evaluated and 
communicated: probability ranges that can 
be narrowed with future research; unknowns 
that are linked to a deep lack of knowledge; 
uncertain reactions that depend on societal 
decisions and geopolitical events; and other 
areas of uncertainty that reflect random or 
chaotic features of the climate system. 

The implications of these uncertainty 
types for policymaking and research need 
to be untangled. Those that relate to under- 
lying Earth-system processes, such as 
climate mechanisms that we do and do not 
understand, or the inherent variability of the 
climate system, can be addressed through 
research that increases understanding of 
climatic hazards. Extreme events and result- 
ing damages lie in the tails of probability 
distributions that are inherently difficult to 
quantify or even characterize qualitatively. 

Uncertainty need not be a bad 
thing. Uncertainties related to human 
choices — such as the multiple pathways to 
achieve a climate goal — can offer flexibility’. 


For example, much of the uncertainty in the 
relationship between emissions in 2050 and 
eventual temperature rise stems from the 
possibility of compensating for modest short- 
term emissions reductions with larger efforts, 
including negative emissions, in later decades. 

An awareness of the diversity of options 
and their risks is important for making 
smart policies that allow for regular revisions 
in light of new information and feedback. 
More ambitious near-term emissions reduc- 
tions create more flexibility for responses 
through the century, depending on whether 
useful and affordable technologies become 
available and how climate impacts pan out. 
Less mitigation early on would constrain 
options later and compound risks’. Short- 
term actions — such as the commitments 
for 2025 or 2030 that countries have made 
towards the Paris Agreement — can be com- 
patible with a range of long-term targets, 
depending on the ambition of our efforts 
later in the century. 

Assessing whether current policies are 
consistent with long-term goals depends on 
many factors that are impossible to predict 
with confidence*’. And not knowing how 
people will respond makes such an assess- 
ment even harder. So emissions pathways 
that seem compatible today with a long-term 
temperature target could lead us to higher — 
or lower — levels of warming, depending 
on everything from future global climate 
policies to technology costs to the climate 
sensitivity of the Earth system. Intensified 
focus on limiting 


global warming “Synergies and 
to 2°C or 1.5°C trade-offs must 
decreases the risk pe evaluated, 
ofgreaterwarming including risks 
in the long term, arising from 

for example a rise mitigation 


exceeding 3 °C, 
should available 
technologies turn 
out to be limited or 
climate sensitivity higher than expected. 
Researchers need to assess how different 
sources of uncertainty affect decision- 
making, especially in worst-case scenarios. 
What should we do if temperatures start to 
rise more rapidly or the impacts are more 
dangerous than we expect? How can we 
detect such departures and how should we 
alter course? Climate policies might prove 
to be harmful and need revising; technol- 
ogy costs might not fall; carbon capture and 
sequestration might not work. 


actions — not 
just inactions.” 


Inform holistic solutions. A fuller 
evaluation of risks and options is needed 
that includes those created by climate- 
change responses for other policy goals. 
For example, the assessment of climate- 
change risks at 1.5°C in the IPCC’s 2014 
Synthesis Report foresaw impacts on coral 
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reefs, Arctic sea ice, water availability, food 
production and sea-level rise. But the bigger 
picture should also include issues related to 
climate mitigation, such as economic duress, 
land- and water-use trade-offs and calls for 
high-risk geoengineering methods. 

The impacts of climate changes and 
climate policies will interact if, for instance, 
a slower reduction in poverty owing to 
higher energy costs increases vulnerability. 
Synergies and trade-offs must be evaluated, 
including risks arising from mitigation 
actions — not just inactions. Social and cli- 
mate scientists must investigate the political 
and socio-economic impacts of climate poli- 
cies (short- as well as long-term), the distri- 
bution of those who benefit and those who 
are adversely affected, and the influences of 
powerful interest groups. 

It is important to explore how climate 
responses can advance the Sustainable 
Development Goals and especially poverty 
reduction”. For instance, improving access 
to clean energy and decreasing the economic 
impacts of extreme weather events can accel- 
erate development progress while protect- 
ing poorer nations against climate change. 
Climate action and protection will never be 
the sole priorities for decision-makers, but 
they will be integral to the full policy land- 
scape. Research and assessment can create a 
powerful foundation for these interactions, 
and empower decisions in the years ahead. m 
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Migrant Somalis crowd the night shore of Djibouti trying to capture inexpensive mobile-phone signals. 


Social-progress panel 
seeks public comment 


Marc Fleurbaey and colleagues explain why and how 300 scholars in the social 
sciences and humanities are collaborating to synthesize knowledge for policymakers. 


proliferated to marshal scientific knowl- 

edge. There are panels on climate change, 
biodiversity, chemical pollution, food 
security and nuclear proliferation. All are 
concerned with long-term issues that have 
profound economic, social, political and 
cultural ramifications. 

Those issues, and the uncertainties around 
them, represent unprecedented challenges 
for our societies. Many of the obstacles to 
the identification and implementation of 
solutions to the ‘wicked problems’ — those 
that are multifactorial and exceptionally 
complex — come from inertia and misalign- 
ment in institutions, conventions and forms 
of collective action’. Meanwhile, we are still 
facing many classic threats — war, violence 
and terrorism produce major disruptions 
and instabilities while widening inequali- 
ties put increasing strains on social cohesion. 

All this questions our collective capacity to 
deliver on global sustainability goals and to 


[retire panels of experts have 
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ensure a viable future for subsequent genera- 
tions. Responses, so far, do not live up to the 
urgent and critical nature of the challenges. 
Impressive efforts have been made around 
the adoption of the United Nations’ Sustain- 
able Development Goals and the Paris climate 
agreement. However, dysfunctional, short- 
sighted policymaking in many countries 
is profoundly worrisome, as is foundering 
cooperation in international arenas such as 
the World Trade Organization and the Euro- 
pean Union, and the return of anti-demo- 
cratic trends. The absence of a positive and 
cohesive long-term vision of what we could 
collectively aim for is one key factor responsi- 
ble for this helplessness and impotence. 

That vision is the mission of a new panel 
convened last year, the International Panel 
on Social Progress (IPSP). It comprises more 
than 300 social-science and humanities 
scholars coordinated by the Fondation Mai- 
son des Sciences de THomme in Paris and 
by Princeton University in New Jersey. The 


© 2016 Macmillan Publishers Limited. All rights reserved. 


IPSP is preparing a report on directions that 
could be taken in the twenty-first century to 
create better societies. We are members of 
the panel’s steering committee, and two of us 
(R.K. and H.N.) are co-chairs ofits scientific 
council. In the next few months, the IPSP 
will release the first draft of its report. 

We call on researchers, policymakers, 
think tanks, companies, non-governmental 
organizations (NGOs) and citizens to provide 
us with feedback during the comment period. 
From August to December 2016, interested 
parties will be able to weigh in on the panel 
website, www.ipsp.org, which will host a com- 
ment platform, discussion forums and sur- 
veys. Informed by these views, we hope that 
the final report will reflect an open and broad 
international debate on ‘mobilizing utopias. 


SYNTHESIS REPORT 

Modern science and technology have been 
nurtured by a fervent belief that they lead to 
social progress. It has become clear that the 


relationship is more complex. Considerable 
developments in the social sciences and the 
humanities since the Second World War have 
brought a much better understanding. For 
instance, the virtues and limitations of mar- 
ket economies and public interventions have 
been extensively scrutinized at the intersec- 
tion of economics, political science, sociology 
and anthropology. The drivers of inequality 
and its possible remedies are still debated, but 
that discourse is now much more advanced, 
thanks in particular to better data. 

These developments have coincided with 
growing specialization between and within 
disciplines, and an increasing awareness of 
the diversity of regional perspectives. This 
makes it impossible for a single scholar or 
even a small group of experts to synthesize 
the accumulated corpus of knowledge. 

Creating such a synthesis that will be 
accessible to policymakers and social actors 
therefore requires a large, coordinated effort. 
The IPSP brings together scholars from eco- 
nomics, sociology, political science, law, 
anthropology, history, science and technol- 
ogy, and philosophy. The panel includes 
representatives from around the world, with 
about 40% of them women. The geographi- 
cal composition of the panel is proportionally 
representative of national academic out- 
put, which unfortunately means that non- 
Western countries are under-represented 
relative to their populations, and many of 
the representatives they do have work in 
developed countries. The panel is split into 
22 thematic chapters and five cross-cutting 
groups (science and technology, gender, 
migrations, health and social movements). 

The chapters are grouped into three parts. 
One on socio-economic transformations 
examines economic growth and environ- 
mental constraints, inequalities, labour, 
urbanization, markets, corporations and 
the welfare state. The part on governance 
explores trends and options for democratic 
institutions, the rule of law, global govern- 
ance, multinational organizations, violent 
conflicts, as well as the evolving forms of 
communications and media. The third 
part studies the sources and consequences 
of transformations in cultures and values, 
religions, families, health and education, as 
well as identities and social bonding. 

Issues addressed include: the shift from pro- 
duction and consumption to well-being; the 
importance of urban design in shaping social 
relations; the transformation of the role of the 
welfare state; the questioning of democratic 
institutions in a globalized world and the con- 
tested diversification of family and sexuality. 


BROAD INFLUENCE 

This broad scope will make it possible, in the 
final report, to propose a systemic perspec- 
tive on the evolution of societies in the world. 
Such a bird’s-eye view has been largely left to 


the media and popular pundits, and needs 
to be reclaimed by the scholarly community 
through the mobilization of its expertise. 

Members of the IPSP hope to influence var- 
ious audiences and processes. The main goal 
is to enter into a dialogue with citizens, social 
actors (NGOs and think tanks) and policy- 
makers, providing them with useful ideas 
that can enrich the debates, and guide actions. 

Another goal is to reach researchers as 
well as local, national and international 
research organizations. The Intergovern- 
mental Panel on Cli- 


mate Change (IPCC), “Diversity of 
for instance, has gal- approaches 
vanized research in isanasset 
climate science and ratherthana 


policy through a simi- 
lar contribution. The 
IPSP report will provide a critical review of 
the literature on social progress, identifying 
areas of consensus, controversial points and 
knowledge gaps. The effort will also propose 
innovative insights, including alternative 
policy narratives and ways to frame problems. 
For instance, the reduction of inequalities is 
usually discussed in terms of income tax and 
wealth redistribution, but can also be pursued 
by governance of the labour market and new 
revenue from environmental policy. 

To reach this diverse audience, the IPSP 
will produce a variety of outputs. The three- 
volume report will bring together the work 
ofall 22 chapters. A smaller book, written by 
a small team and for a general audience, will 
distil the main narrative and conclusions. We 
will also produce a policy toolkit of recom- 
mended actions for all types of actors, as well 
as video interviews and talks. This approach 
to diffusion emulates bodies such as the 
Organisation for Economic Co-operation 
and Development (OECD), which has, for 
instance, an interactive website for its Better 
Life Index (www.oecdbetterlifeindex.org). 


liability.” 


PROS AND CONS 
The IPSP process may be useful to other 
panels. The report will make recommen- 
dations — in contrast to the IPCC, which 
avoids prescriptive language (although 
special efforts were made in its latest report 
to clarify the ethical issues and value judge- 
ments involved in policymaking’). To 
respect the diversity of views among readers 
and users, every IPSP recommendation will 
be associated with a clear acknowledgment 
of the underlying values and assumptions. 

Can the panel have an impact on decision- 
makers? Unlike the IPCC, the IPSP is not a 
government-led institution. It is a bottom- 
up initiative coordinated by social scientists 
and scholars who are free to delineate the 
scope of their analysis. This independence 
has pros and cons. 

On the plus side, the report can be 
more nimble and frank than those of the 
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IPCC, say, not having to be approved by 
governments. We hope that such freedom 
will be an advantage for reaching a larger 
set of actors — such as NGOs, trade unions, 
think tanks and activists. And the consulta- 
tion exercise in the second half of this year 
will include meetings with decision-makers 
such as United Nations organizations, the 
World Social Forum, the World Economic 
Forum, the OECD and the World Bank. 

Yet the IPCC-style official approval pro- 
cesses does provide an institutional space 
where experts and decision-makers air their 
perspectives’. Without this structure, there is 
arisk that the report won't meet the demand of 
potential users. Finally, the IPSP relies entirely 
on the goodwill of busy scholars and their 
willingness to make time for this collective 
endeavour without any financial compensa- 
tion. Many of them contribute part of their 
institutional budget to cover some of their 
costs, a remarkable proof of commitment. 

Can such a diverse group produce a strong 
message? The goal of laying out the ‘state of 
the art’ in contested domains is not unrea- 
sonable: existing panels tackle social and 
economic policy issues (for example, the 
IPCC working groups on climate adaptation 
and mitigation policies), and controversies 
also arise around more technical subjects 
such as nuclear proliferation or biodiver- 
sity*. At the IPSP, we consider that disagree- 
ment requires panellists to focus on objective 
reviews of ongoing debates rather than seek- 
ing consensus at the cost of substance and 
depth. Diversity of approaches is an asset 
rather than a liability. 

The panel is itself an experiment in whether 
the social sciences can, in this format, make 
a difference in the quest for social progress. 
That requires not only academic input, but 
also a broad, open and lively public debate. 
Please have your say! m 
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AUS Predator drone in Kandahar, Afghanistan. 


Death by remote control 


Ann Finkbeiner examines a study that probes how drones have ‘remixed’ warfare. 


drone is a good way to kill some- 
A A pilotless aeroplane watches 
from above, targets and shoots with 
precision, and disappears into the sky. The 
killer is never in danger. Drones can fly at 
160-300 kilometres per hour at altitudes of 
up to 15,000 metres, hover over a target for 
hours, ‘see’ people and objects (clearly enough 
to read a car number plate from 3 kilometres 
up), and detect mobile-phone signals. At one- 
tenth to one-hundredth of the price of fighter 
planes, drones are practically disposable. 
Little surprise, then, that by 2012, one-third 
of the US Air Force's aircraft were drones, and 
half of the pilots in the Air Force were being 
trained to fly them. The US military has used 
drones for fighting in Afghanistan, Iraq, 
Yemen, Somalia, Libya, Pakistan and the Phil- 
ippines. This has “remixed” warfare, anthro- 
pologist Hugh Gusterson argues in Drone, an 
overview of the impli- 


cations and ethics. If ONATURE.COM 
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It is so asymmetrical 
that it resembles hunt- 
ing, he writes — a “new 
form of state violence’, 
harder to define and 
control with national 
and international laws. 
As such, he argues, “the 
drone is an inherently 
colonialist technology 
that makes it easier for 
the United States to 
engage in casualty-free 
and therefore debate- 
free intervention” 
Drone operation is bizarre. US pilots sit in 
grubby trailers in front of computers some- 
where in the depths of the Nevada desert; 
Gusterson calls them “stick monkeys”. The 
screens show live videos of roads, com- 
pounds, people — views also seen by intel- 
ligence analysts, commanders and military 
lawyers, all of whom inform the decision to 
fire. Then a ‘sensor operator’ aims a laser 
at the person or vehicle targeted, the drone 
pilot triggers a missile that follows the laser, 
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and 15-30 seconds later, they watch the 
infrared flash, the flames and the incinera- 
tion of the enemy. 

Some might imagine that this precision 
makes drones more humane compared to, 
say, the bloodbath of the Second World War's 
Battle of the Bulge. But it does not prevent 
civilians from being killed: if someone wan- 
ders within screen view at the last second, 
or a targeted soldier passes his phone to a 
relative, itcan happen. Those who live under 
circling drones — interviewed by foreign 
correspondents or aid workers — report 
chronic terrorization. And according to a 
study by the US Department of Defense, 
Gusterson recounts, half of the US pilots, 
despite being on the other side of the planet, 
have “high levels of operational stress” and 
can experience post-traumatic stress disor- 
der. Targeted killing ultimately means only 
that fewer civilians die. Nobody knows how 
many: estimates depend entirely on who 
does the estimating. 

The standard rules of warfare no longer 
apply. The Geneva Conventions and their 
protocols — established from 1949 to 2005 
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to protect civilians, among others — define 
battlefields and combatants. But insurgent 
wars have no clear battlefields and no full- 
time, uniformed fighters. Combatants are 
the people listed as targets for drone strikes. 
The list, Gusterson writes, is “maintained 
by US. military and intelligence agencies’, 
and includes people not known to be terror- 
ists but possessing that profile. A person is 
added after analysts balance their impor- 
tance against the number of civilians likely 
also to be killed. Gusterson wonders about 
the line between legal targeted killing and 
illegal assassination. 

Although 76 countries use drones, Guster- 
son mentions only the United Kingdom, 
Israel, Iran and the United States, focusing on 
the last. He writes that he doesn't intend this 
book as an argument for or against drones 
and that he wants only a debate on regulating 
their use, yet he is clearly against them. 

A reader trying to decide whether drones 
are a more humane form of weaponry, a 
hunting party, a neocolonial ploy or all 
three at once, will want to look closely at 
the author's sources. Anthropologists have 
traditionally gone into the field, meeting, 
observing and listening to their sources. 
Gusterson’s sources are predominantly jour- 
nalists, memoirists and authors who report 


A drone operator in Nevada. 


regularly on the field or are veterans of it. 
Some anthropologists do analyse a culture 
remotely, on the basis of others’ experiences 
and impressions. As a journalist, I would 
rather do my own reporting and not hover 
drone-like over the field that others have 
defined. m 


Ann Finkbeiner is a freelance science 
writer in Baltimore, Maryland, and author 
of The Jasons. She blogs at The Last Word 
on Nothing, www.lastwordonnothing.com. 
e-mail: anniekf@gmail.com 
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Cities for a Small Continent: International Handbook of City Recovery 
Anne Power POLICY (2016) 

Many of Europe’s storied cities have seen more bust than boom for 
decades, writes urban-sustainability specialist Anne Power. Yet a 
number have risen reinvented, and in this brilliant analysis, Power 
shows how. She follows the march of seven “Phoenix cities” with 
strong industrial legacies, from Sheffield, UK, to Turin in Italy, as they 
weather upheavals and de-industrialize with the aid of major public 
investment. These conurbations should be seen, she argues, as the 
vanguard in the low-carbon transformation outlined by economist 
Nicholas Stern (M. Grubb Nature 520, 614-615; 2015). 


Anatomy Museum: Death and the Body Displayed 

Elizabeth Hallam REAKTION (2016) 

Pickled in formalin, stripped down to articulated skeletons or 
depicted in wax or plastic, human anatomical remains have 
educated generations of medics and fired the public imagination. 
Anthropologist Elizabeth Hallam uses the Anatomy Museum at the 
University of Aberdeen, UK, to anchor a history of such collections as 
“synoptic mazes” — labyrinthine summations of knowledge. Hallam 
charts their convoluted chronicles of acquisition, dissection and 
preservation, weaving in a narrative on the cultural display of death, 
from ancient ossuaries to plastinated bodies. 


The Art of Flight 

Fredrik Sjoberg (Translated by Peter Graves) PARTICULAR (2016) 
Entomologist Fredrik Sjoberg’s best-selling memoir The Fly Trap 
(Particular, 2014) marked him as a maestro of the episodic. Here, he 
completes a trilogy with two books in one — “accidental journeys” by 
fellow Swedes whose omnivorous curiosity rivalled his own. The Art 
of Flight focuses on Gunnar Widforss, exalted in the United States for 
his haunting landscape paintings of national parks. The Raisin King 
tackles polymath Gustav Eisen, who studied earthworms, Anopheles 
mosquitoes and viticulture, brought avocados to California and 
sparked the founding of Sequoia National Park. A joy. 


The Switch 

Chris Goodall PROFILE (2016) 

The world is poised for the solar revolution, argues energy writer Chris 
Goodall in this nippy, number-crunched study of the science behind 
the “switch”. Noting that solar farms will have to cover 1% of Earth’s 
surface by 2050 to meet global energy needs, he treads the road 
towards that goal. He examines readiness in industry and banking, 
research on new solar-collection materials such as perovskites, the 
state of back-up renewables and innovative batteries. With many 
governments and some utility companies primed for action, Goodall 
avers, the fossil century could be history within two decades. 


The Doomed City 

Arkady Strugatsky and Boris Strugatsky (Translated by Andrew 
Bromfield) CHICAGO REVIEW PRESS (2016) 

Doyens of Russian science fiction Arkady and Boris Strugatsky wrote 
this nihilistic ‘lost’ novel in the 1970s. In its English-language debut, 
we are dumped abruptly into the Experiment, a garbage-choked, 
baboon-infested city with an artificial sun and an eerily mismatched 
populace. Here, astronomer-turned-rubbish-collector Andrei begins 
a grim trek into the ideology of tyranny. A book that carries an 
Orwellian punch, and a crazed energy all its own. Barbara Kiser 
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Part of CERN’s Large Electron—-Positron Collider, on display at the National Museum of Scotland. 


2 


Workshop of the world 


Colin Macilwain talks to the curators of the National 
Museum of Scotland on the eve of a grand expansion. 


r | the afterlife of Dolly the sheep will take 
an exciting turn next month: the clone 
is the first attraction of a new, perma- 

nent exhibition at the National Museum of 

Scotland (NMS). Dolly, who died in 2003 

and survives in taxidermied splendour, will 

greet visitors at the entrance toa grand atrium 
housing ten refurbished science, technology 
and design galleries in the Victorian building, 
which sits beside the University of Edinburgh. 

Since Dolly’s leap to fame (see go.nature. 
com/1lujdd4k), the anticipated commercial 
bonanza in cloning has not occurred. Never- 
theless, the bovid is part ofa rip-roaring story 
of how a nation of 5 million people helped to 
forge the modern age. 

“Scotland was the workshop of the world? 
explains Klaus Staubermann, a science his- 
torian and principal curator of technology at 
the NMS. He alludes to the turn of the last 
century, when Glasgow's Clydeside hosted as 
much as one-quarter of global shipbuilding 
and locomotive production. But he brushes 
off any suggestion that the transformation 
from powerhouse to tourist attraction is a 
metaphor for Scotland’s recent history that 
runs too close to the bone. “The Scots have 
always had this ability to reinvent them- 
selves; Staubermann says, noting the ‘Silicon 
Gler of the 1980s. Here, much of Europe’s 
computer hardware was assembled. More 
recent, if scattered, successes have emerged 
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in computer gaming 
(Grand Theft Auto 


National Museum 
of Scotland gallery 


originated here) and opening 
biomedical science. 8 July 2016. 

The range of exhib-  Fdinburgh, UK. 
its being installed does 


justice to Staubermann’s grand narrative. 
There is an accelerating ring from the dis- 
used Large Electron—Positron Collider at 
CERN (Peter Higgs, namesake of the boson, 
is at the University of Edinburgh). There’s 
the story of Glasgow pharmacologist and 
Nobel laureate James Black, who had a cen- 
tral role in developing beta blockers. And 
there will be stunning array of electrical and 
mechanical engineering, from the world’s 
oldest Stirling engine to the tip ofa modern 
wind-turbine blade. 

The unifying theme of the six new sci- 
ence and technology galleries is to 
follow scientific concepts through 
to technology and production. 
The museum’s wider collec- 
tion reflects diligent hoard- 
ing of bits and pieces from the 
Industrial Revolution. “Many 
Scots have held on to their 
working machinery,” says 
Staubermann. His research 


A model of a heat engine made by 
Robert Stirling in 1816. 
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includes the history of the machine-tool 
industry, due to feature in a manufacturing 
hall. (Europe's first numerically controlled 
machine tool was built by electronics firm 
Ferranti in Edinburgh in the 1960s.) 

The NMS is Britain’s most popular 
museum outside London, with 1.6 million 
visitors last year. It has enjoyed steady pub- 
lic and private support, while many muse- 
ums globally are suffering from cuts. The 
£14-million (US$20-million) refurbishment 
is supported by the UK national lottery, 
biomedical charity the Wellcome Trust, the 
Scottish government and private donors. It is 
part of a £80-million, 15-year ‘masterplan’ to 
transform the premises, built in 1866. 

Touring the exhibition areas last month, 
surrounded by workers assembling displays, 
I saw how technology is transforming the 
craft of exhibiting itself. For example, lay- 
ers of interactivity allow visitors to delve as 
deep as they like. But there will also be older- 
school approaches on display. “Whenever 
I tell a taxi driver what I’m doing, the first 
thing they want to know is if there'll be plenty 
of buttons to push,” says Staubermann. 

The museum’s first director, George 
Wilson, was a professor of technology at the 
University of Edinburgh, and the institu- 
tions remain closely linked. Most museum 
curators teach at the university, and many 
are involved in joint research projects. 

Right now, the priority is the logistics 
of installing some 3,000 objects — three- 
quarters on public display for the first time 
— in time for the 8 July opening. Elsa Cox, 
curator of the energy hall, Energise, has just 
collected a control console from the decom- 
missioned Murchison oil platform in the 
North Sea. One of the challenges, she says, 
is getting industrial contributors to take her 
deadlines seriously. 

Energise exhibits will include an early 
prototype of Salter’s duck — a key device in 
the history of wave power — and the switch 
that connected the now-defunct Dounreay 
fast breeder reactor in Caithness to the grid. 
Visitors will get the chance to power a city for 
a day, Cox says, learning about the trade-offs 
between different sources of energy. 

The display won't take sides on the heated 
argument between the Scottish govern- 
ment, which backs wind-power, and the 
government of Britain at large, which has 
cut support for renewables and wants 
to increase nuclear capacity. “The 
Scottish government funds us,” 
says museum spokesman Bruce 

Blacklaw. “It doesn't govern us.’ m 


Colin Macilwain is a science- 

policy journalist, based in 
Edinburgh, and editor of the policy 

newsletter Research Europe. 

e-mail: cfmworldview@googlemail. 
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Count cryptic species 
in biodiversity tally 


The race to describe and 

archive the planet's dwindling 
biodiversity (see K.-D. B. Dijkstra 
Nature 533, 172-174; 2016) 
becomes even more urgent with 
the realization that the task’s scale 
may bean order of magnitude 
greater than estimated. 

Dijkstra notes that we have so 
far named only about 1.2 million 
of Earth’s estimated 8.7 million 
or so eukaryotic species. Such 
estimates are based largely on 
counts of invertebrate ‘species’ 
that are visually distinguishable 
(‘morphospecies’). However, 
genetic analysis has revealed 
that many supposedly uniform 
morphospecies are complexes of 
multiple, reproductively isolated 
lineages, each of which constitutes 
a separate but cryptic species 
(D. Bickford et al. Trends Ecol. 
Evol. 22, 148-155; 2007). 

These discoveries boost the 
biodiversity of even the largest 
vertebrates, such as elephants. 
The effect is greater in small 
vertebrates (such as lizards and 
frogs) and in invertebrates, which 
are often complexes of ten or 
even more species (P. M. Oliver 
et al. BMC Evol. Biol. 10, 386; 
2010). The quoted estimate of the 
number of (morpho)species on 
Earth could therefore be just 10% 
of the true species number. 
Michael S. Y. Lee South 
Australian Museum and Flinders 
University, Adelaide, Australia. 
Paul M. Oliver Australian 
National University, Canberra. 
mike.lee@samuseum.sa.gov.au 


Fewer papers would 
scotch early careers 


Daniel Sarewitz argues that the 
pressure to publish is fuelling 
irreproducibility, but we disagree 
that the solution is to publish 
fewer papers (Nature 533, 147; 
2016). In today’s competitive 
arena, asking this of scientists 
— particularly junior ones — is to 
ask them to fall on their swords. 
Investing more effort in fewer 


but ‘more complete’ publications 
could hold back early-career 
researchers, who already face 
fierce competition. To generate a 
first-author publication, graduate 
students on average take more 
than a year longer than they did 
in the 1980s (R. D. Vale Proc. 
Natl Acad. Sci. USA 112, 13439- 
13446; 2015). Introducing further 
delays for junior scientists is not 
an option as long as performance 
is rated by publication metrics. 

In our view, publishing less 
is nota feasible or responsible 
way to improve data quality. 
This would be better achieved by 
increasing the transparency of 
peer review and by introducing 
alternative metrics as indicators 
of reproducibility. Science's goal is 
to share as much information as 
possible — not to withhold it. 
Gary S. McDowell, Jessica K. 
Polka The Future of Research, 
Abington, Massachusetts, USA. 
jessica.polka@gmail.com 


Pre-emptive action 
against EU invasives 


As appointed representatives of 
the European Union's Scientific 
Forum on Invasive Alien Species, 
we wish to point out that the EU 
regulation states priority should 
be given to the listing of invasive 
species “that are not yet present in 
the Union or are at an early stage 
of invasion” (see M. Lehtiniemi 
et al. Nature 533, 321; 2016). 

The risk assessments needed 
to include species on the EU list 
of concern were already available 
for most of the 37 species (see 
go.nature.com/gigftz) and were a 
natural starting point. Moreover, 
listing is a political process: 
actions to protect biodiversity 
are weighed against factors such 
as socio-economic interests. 
Adequate evidence is necessary to 
ensure that proper action is taken. 

Notably, none of the targeted 
37 species is established in all 
EU member states, and all have 
potential for future spread (see 
go.nature.com/28vtjpk). 

Any member state, including 
those where Lehtiniemi and 


colleagues are based, can develop 
and submit risk assessments for 
candidate species. The scientific 
forum assesses these according to 
regulation criteria. Member states 
and the European Commission 
then decide whether to regulate 
species at the EU level. 

Johan Naslund Swedish 
Environmental Protection Agency, 
Stockholm, Sweden. 

Erland Lettevall Swedish Agency 
for Marine and Water Management, 
Gothenburg, Sweden. 
johan.naslund@naturvardsverket.se 


Supporting women 
postdocs in Israel 


Israel’s Weizmann Institute 

of Science is ranked among 

the world’s top research 
institutions, but only 16.5% of 
its faculty members are women. 
Although this is still too low for 
a multidisciplinary institution, a 
series of initiatives are gradually 
redressing the balance. 

In Israel, common hurdles for 
women scientists are magnified 
because there are only eight 
major universities and posts are 
few. These appointments carry 
an unwritten prerequisite for 
postdoctoral research experience 
overseas. But most Israeli PhD 
graduates are older than their 
peers abroad — military service is 
compulsory for women and men, 
so they often have families and 
relocation can bea problem. 

To encourage more women to 
stay in academia, the Weizmann 
Institute launched the Israel 
National Postdoctoral Program 
for Advancing Women in Science 
in 2007. This awards US$40,000 
over two years to supplement 
the fellowships of ten women 
graduates selected annually to 
pursue research overseas. So 
far, 38 of the 96 recipients have 
returned to academic positions 
in Israel; 6 have academic posts 
abroad; and 5 returned to non- 
faculty appointments (47 are still 
overseas). 

This year, the Weizmann 
Institute set up an annual award of 
$20,000 over two years for female 


postdocs intending to split their 
research between the institute 
and a foreign lab. Several Israeli 
universities and the country’s 
Council for Higher Education 
now run similar programmes. 
Daniella Goldfarb Weizmann 
Institute of Science, Rehovot, Israel. 
daniella.goldfarb@weizmann.ac.il 


Limit uncertainties 
in land emissions 


Launching satellites to measure 
carbon dioxide emissions is 
only part of a more integrated 
solution to achieving the goals 
of the Paris climate agreement 
(Nature 533, 446-447; 2016). In 
our view, the biggest hurdle is 
to reduce the high uncertainty 
in CO, emission estimates — 
particularly from land use. 

Land-use emissions are harder 
to quantify accurately than are 
those from, say, fossil fuels. To 
reduce uncertainties in tracking 
carbon from land use, it is crucial 
to monitor other data sources 
such as biomass. This can be done 
by remote sensing, for example 
with NASAs GEDI LIDAR sensor 
or the European Space Agency's 
proposed BIOMASS (P-band 
radar) sensor. 

We also need many more 
ground-based measurements 
of biomass and CO, exchange 
with the atmosphere. However, 
space agencies’ budgets for 
calibrating and validating satellite 
products are limited, and there 
is inadequate coordination and 
data sharing between the remote- 
sensing and ground-based 
measurement communities (see 
A.K. Skidmore et al. Nature 523, 
403-405; 2015). 

We suggest that crowdsourced 
data from mobile-phone apps 
and more extensive sharing of 
ground-based measurements 
could have the greatest potential 
for improving the monitoring of 
biomass and CO, exchange — at a 
fraction of the cost of satellites. 
Steffen Fritz, Dmitry 
Schepaschenko, Linda See 
IIASA, Austria. 
fritz@iiasa.ac.at 
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OBITUARY 


Thomas Kibble 


(1932-2016) 


Theoretical physicist and Higgs-boson pioneer. 


om Kibble contributed to our deepest 
| understanding of the fabric and forces 
of the Universe. He is best known 
for his work on the phenomenon called 
spontaneous symmetry breaking — a cor- 
nerstone of the standard model of particle 
physics. His work led to the concept of anew 
elementary particle now known as the Higgs 
boson, which was experimentally observed 
in 2012. Kibble was also a pioneer in apply- 
ing ideas from both high-energy physics and 
condensed-matter physics to the study of the 
early Universe. 

Thomas Walter Bannerman Kibble, who 
died on 2 June, was born in 1932 in Madras 
(now Chennai), India. His father was a 
professor of mathematics. He was sent to 
Edinburgh in 1944 to complete his second- 
ary education at Melville College. Between 
1951 and 1958, he pursued a degree in phys- 
ics anda PhD in mathematical physics at the 
University of Edinburgh. During the final 
year of his PhD he married Anne Allan, with 
whom he had three children. They were hap- 
pily married until her death in 2005. 

In 1959, Kibble joined the theoretical- 
physics group at Imperial College London, 
starting an association with the institution 
that would last for nearly 60 years. It was 
in the summer of 1964 that, along with 
US physicists Gerald Guralnik and Carl 
Richard Hagen, he wrote a seminal paper 
on symmetry breaking (G. S. Guralnik et al. 
Phys. Rev. Lett. 13, 585-587; 1964). 

Symmetry and the breaking of symmetry 
are deep principles that arise in many differ- 
ent physical contexts. In a lump of iron, for 
example, the interactions of the atoms respect 
a perfect symmetry between different direc- 
tions in space. However, when the iron is 
cooled to below 770°C it generates a mag- 
netic field that points in a specific direction, 
selected spontaneously, and the rotational 
symmetry is said to be spontaneously broken. 

During the early 1960s, initial attempts to 
incorporate spontaneous symmetry break- 
ing into particle physics showed that certain 
kinds of massless particle would necessar- 
ily be predicted. However, Kibble, Guralnik 
and Hagen studied spontaneous symme- 
try breaking in the context of a realization 
of symmetry called gauge symmetry. In so 
doing, they reached the striking conclusion 
that, instead, certain kinds of elementary 
particles called vector bosons would actu- 
ally acquire mass. 

This beautiful mass-generating 
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mechanism was independently described 
slightly earlier in 1964, first in a paper by 
Belgian physicists Robert Brout and Francois 
Englert, and then ina second paper by Peter 
Higgs, who also explicitly noted that the 
mechanism gives rise to another massive 
particle, now known as the Higgs boson. 

The significance of these elegant ideas was 
not immediately recognized. The electro- 
magnetic force is mediated by the photon, a 
vector boson, but photons are massless. By 
contrast, the early theory of the weak nuclear 
force, another of the four fundamental forces 
of nature, did require massive vector bosons, 
now called the W and Z bosons. In a key 
single-author 1967 paper, Kibble showed 
how the symmetry-breaking mechanism can 
be generalized, and showed, crucially, how 
it can leave a vector boson, such as the pho- 
ton, massless while giving masses to others 
(T. W. B. Kibble Phys. Rev. 155, 1554; 1967). 

These insights became central 
components in the unification of the 
electromagnetic and weak nuclear forces 
into the electroweak theory. (Building on 
earlier work of Sheldon Glashow, this was 
formulated in 1967 by Abdus Salam and 
Steven Weinberg.) In 2010, Kibble shared 
the American Physical Society’s J. J. Sakurai 
Prize for Theoretical Particle Physics with 
the five other scientists credited with discov- 
ering the spontaneous symmetry-breaking 
mechanism. 

In 2013, Englert and Higgs were awarded 
the Nobel Prize for Physics. Many people in 
the theoretical-physics community, includ- 
ing Higgs, hoped that Kibble would be 
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chosen to share this award. Kibble himself, 
aman of great modesty, never expressed any 
disappointment. 

In 1976, Kibble laid the foundations for 
a second key phase of his scientific career. 
Drawing on ideas from both high-energy 
physics and condensed-matter physics he 
realized that phase transitions in the early 
Universe could leave observable cosmologi- 
cal signatures. He showed that for certain 
conjectured theories of high-energy physics, 
the mechanism of spontaneous symmetry 
breaking predicts novel structures, including 
one-dimensional concentrations of energy 
that stretch across the cosmos, which he 
called cosmic strings. 

Kibble analysed how cosmic strings evolve 
after the phase transition and suggested that 
they could have a significant impact on the 
history of the early Universe. Measurements 
of the cosmic microwave background, the 
relic radiation from the Big Bang, now indi- 
cate that cosmic strings are not the domi- 
nant mechanism for creating the large-scale 
structure of galaxies. Nevertheless, the 
detection of cosmic strings in future experi- 
ments would still provide momentous new 
insights into the fundamental forces. 

Among his many honours, Tom was 
made a Commander of the British Empire 
in 1998 and was knighted in 2014. He was 
especially proud of being the first recipient 
of the Nature/NESTA lifetime achievement 
award for mentoring in 2005. 

Tom was quiet and gently spoken, but 
he had an influential leadership role in UK 
academia. As head of the physics depart- 
ment at Imperial College from 1983 to 
1991, he skilfully steered it through a diffi- 
cult period of low funding for science in the 
United Kingdom. 

Tom was also an active campaigner. He 
joined the British Society for Social Respon- 
sibility in Science soon after its formation in 
1969, and for three years was chair of the 
organization’s national committee. He was 
also chair of Scientists Against Nuclear Arms 
from 1985 to 1991. 

Tom changed our understanding of 
the Universe at the deepest level. He was 
a man of great integrity and was regarded 
with much admiration and affection by his 
colleagues and students. m 


Jerome Gauntlett is head of theoretical 
physics at Imperial College London. 
email: j.gauntlett@imperial.ac.uk 
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The dark side of antibiotics 


Interactions in the gut between host cells and bacteria can determine a state of health or disease. A study investigates how 
antibiotic treatment can affect host cells in a way that drives growth of pathogenic bacteria. SEE LETTER P.697 


THIBAULT G. SANA & DENISE M. MONACK 


n unwanted side effect of antibiotics 
A: be an increase in pathogenic gut 

bacteria. A study reported on page 697 
by Faber et al.’ now shows that increased 
growth of the gut pathogen Salmonella enter- 
ica serovar Typhimurium (S. Typhimurium) 
after antibiotic treatment in mice is the result 
of sugar oxidation that is driven by a host 
enzyme. 

The mammalian gastrointestinal tract is 
colonized by a dense community of resident 
microbes known as the gut microbiota that 
not only helps us to digest certain foods, but 
also helps to prevent colonization by invading 
and potentially hostile microorganisms — a 
property known as colonization resistance. 
Perturbations to the microbiota, such as those 
caused by the use of oral antibiotics, often lead 
to increased colonization by various gut patho- 
gens, such as S. Typhimurium and Clostridium 
difficile’*. Broad-spectrum antibiotics deplete 
the resident (commensal) microbiota, allowing 
pathogens to proliferate, which in turn can lead 
to gastrointestinal inflammation’. Antibiotic- 
associated diarrhoea and inflammation occurs 
in 5-25% of people treated with antibiotics and 
is considered a major health problem’. 

Nutrient availability is a key factor in 
determining bacterial growth. It has been 
known for decades that pretreatment of mice 
with the antibiotic streptomycin increases 
the severity of colon inflammation induced 
by S. Typhimurium’, and in the inflamed gut 
there are multiple nutrients that can facilitate 
the replication of this bacterium. For exam- 
ple, S. Typhimurium and other members of 
the Enterobacteriaceae family can use etha- 
nolamine derived from the degradation of 
cell membranes as a source of carbon and 
nitrogen”. 

In addition, antibiotic-mediated disruption 
of the microbial food web can give rise to 
microbiota-liberated sugars in the gut that 
promote the growth of S. Typhimurium and 
C. difficile’. Moreover, antibiotics also increase 
expression of a host enzyme called inducible 
nitric oxide synthase (iNOS)", but the link 
between iNOS and enhanced S. Typhimu- 
rium growth in the gut had not been previously 
established. The key advance in the paper by 
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Figure 1 | Gut changes after antibiotic treatment. Faber et al.' confirmed that treatment of mice with 
antibiotics increases the expression of the host enzyme inducible nitric oxide synthase (iNOS) by an 
unknown mechanism. iNOS can produce reactive oxygen species, and the authors propose that such 
species can oxidize glucose and galactose sugars to glucarate and galactarate, respectively. These oxidized 
sugars are metabolized by Salmonella enterica serovar Typhimurium (S. Typhimurium), as well as by 
other members of the Enterobacteriaceae family, such as resident Escherichia coli. Antibiotic treatment 
also results in the formation of sialic acid and fucose by the activity of resident bacteria; these nutrients 
can also be metabolized by S. Typhimurium”. Antibiotic treatment therefore ultimately creates a perfect 


mix of nutrients for S. Typhimurium to proliferate. 


Faber and colleagues is to provide mechanistic 
insight into how antibiotic treatment leads 
to iNOS-dependent generation of oxidized 
sugars, which S. Typhimurium uses as a food 
source to grow rapidly in the gut. 

The authors investigated the changes 
that occur after antibiotic treatment. After 
confirming that the treatment leads to 
overexpression of iNOS (by an unknown mech- 
anism), the authors showed that host-driven 
iNOS-dependent oxidization of the sugars 
glucose and galactose in the gut leads to the 
generation of the sugars glucarate and galac- 
tarate, both of which can be metabolized by 
S. Typhimurium. 

The researchers then characterized S. Typhi- 
murium genes involved in metabolizing 
oxidized glucose and galactose (the gudT 
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ygcY gudD STM2959 operon). The expression 
of these genes is induced by hydrogen, a 
fermentation product of the microbiota”, 
indicating that the production of the enzymes 
that they encode is tightly regulated, and 
probably restricted to a gut-like environ- 
ment. Interestingly, related genes were 
found in other Enterobacteriaceae present 
in the resident bacterial community, such as 
Escherichia coli and Klebsiella oxytoca. The 
authors show that these genes are also impor- 
tant for E. coli to increase in numbers in the gut 
after antibiotic treatment. 

Because S. Typhimurium probably competes 
with commensal bacteria for the oxidized sug- 
ars, itis tempting to speculate that it might have 
evolved specific mechanisms to outcompete 
resident bacteria for the same food source. The 


elucidation of such competition mechanisms 
would add a new layer of complexity to our 
understanding of the microbiota’s response 
to antibiotics and will require further studies. 
Thus, the increase of S. Typhimurium in the 
gut after antibiotic treatment can be attrib- 
uted to the microbiota-delivered nutrients 
sialic acid and fucose’’ and to host-mediated 
oxidation of carbohydrates in the gut, providing 
diverse food sources for the pathogen (Fig. 1). 
Antibiotics are useful for treating susceptible 
bacterial infections and certainly provide 
human health benefits. However, with the 
emergence of multidrug-resistant pathogens 
that are predicted” to kill 10 million people a 
year by 2050, there is a dark side to antibiotics. 
The mechanism described by Faber et al. sheds 
light on another dark side. Indeed, increasing 


PHYSIOLOGY 


numbers of studies now describe mechanisms 
used by gut pathogens to take advantage of 
the post-antibiotic period to hyper-replicate 
within the gut and successfully infect the host. 
Faber and colleagues’ work could lead to the 
development of new and better therapeutic 
approaches for preventing diseases that are a 
consequence of antibiotic treatment. m 


Thibault G. Sana and Denise M. Monack 
are in the Department of Microbiology and 
Immunology, Stanford University School of 
Medicine, Stanford, California 94305, USA. 
e-mails: tsana@stanford.edu; 
dmonack@stanford.edu 
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Stressed-out chromatin 
promotes longevity 


Two studies reveal that early -life malfunction in organelles called mitochondria 
brings about lasting changes in how DNA is packaged. These alterations have 
consequences for cellular stress responses and organismal longevity. 


SIU SYLVIA LEE & JESSICA K. TYLER 


rganelles called mitochondria are 

central to the generation of ATP 

molecules, which are the energy 
currency of cells. Mild mitochondrial impair- 
ment in juvenile roundworms (Caenorhabditis 
elegans) has been shown to increase lifespan’. 
Such dysfunction triggers multiple signalling 
cascades, including a pathway called the mito- 
chondrial unfolded protein response’ (UPR™), 
which causes changes in gene expression that 
enable cells and organisms to cope with stress’. 
Writing in Cell, Tian et al.° and Merkwirth 
et al.’ identify factors that relay mitochondrial 
dysfunction to the UPR™. 

Changes in gene expression are largely 
brought about by alterations in how tightly 
DNA is packaged around histone proteins in a 
structure called chromatin. Chromatin regu- 
lation is achieved partly through chemical 
decorations that modify histones, modulating 
this packaging. The current studies reveal that 
histone-modifying enzymes are integral play- 
ers in the UPR™ stress response and lifespan 
extension in C. elegans. 

By using genetic screens, Tian and col- 
leagues found that the gene /in-65 is essential 
for the induction of UPR™ in response to mito- 
chondrial stress. However, it should be noted 
that lifespan was extended even in the absence 
of lin-65, suggesting that mitochondrial stress 


also induces other pathways that promote 
longevity. The authors demonstrated that 
mitochondrial stress causes the LIN-65 protein 
to shuttle from the cytoplasm to the nucleus, 
reminiscent of the UPR™ regulator and tran- 
scription factor DVE-1, which also undergoes 
stress-induced nuclear migration’. Indeed, the 
researchers found that LIN-65 is required for 
the nuclear relocalization of DVE-1. 

Tian et al. provided evidence to suggest that 
the addition of two methyl groups to amino- 
acid residue lysine 9 of histone H3 (a modifica- 
tion dubbed H3K9mez2) in the cytoplasm by 
the MET2 methyltransferase enzyme triggered 
DVE-1 to shuttle to the nucleus. However, the 
possibility that non-histone targets of meth- 
ylation are involved in this process cannot 
be ruled out, because the authors’ suggestion 
is based on methyltransferase inactivation, 
which could alter the methylation of many 
other proteins besides histones. The mecha- 
nism by which H3K9me2 promotes DVE-1 
nuclear migration remains unknown. 

The H3K9me2 modification is generally 
associated with tightly packaged chroma- 
tin. Indeed, Tian and colleagues observed 
that nuclei in worms exposed to mitochon- 
drial stress consistently looked smaller 
and more dense than those in unstressed 
worms. DVE-1 localized to discrete spots away 
from DNA-dense regions, suggesting that this 
factor preferentially targets DNA regions that 
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are less tightly packaged. This work supports 
a model whereby stress, through H3K9me2, 
induces changes in chromatin packaging that 
restrict the number of DNA regions to which 
DVE-1 can bind to regulate transcription. 
Thus, stress leads to specific changes in gene 
expression (Fig. 1). 

Inacomplementary study, Merkwirth et al. 
identified the genes jmjd-1.2 and jmjd-3.1 as 
crucial for the activation of the UPR™ pathway 
and extended lifespan in worms that had mito- 
chondrial impairment. These genes encode 
enzymes that remove methyl groups from a 
histone modification dubbed H3K27me3, 
which is associated with tightly packaged chro- 
matin. Mitochondrial impairment increases 
the expression of these two histone demethyl- 
ases, which presumably leads to unpackaging 
of chromatin (Fig. 1). Overexpression of 
jmjd-1.2 and jmjd-3.1 in C. elegans recapitu- 
lates many of the effects of mild mitochon- 
drial stress, including increased longevity and 
UPR™ induction, perhaps because increased 
levels of JMJD-1.2 and JMJD-3.1 repress the 
expression of many mitochondrial genes, 
thereby impairing mitochondrial function. 

In both studies, the histone modifiers 
seemed to exert their effects predominantly 
on juvenile worms. It is therefore tempting 
to speculate that chromatin is most receptive 
to stress-induced reorganization in early life, 
and that these changes help to establish a stress 
response that persists into adulthood. His- 
tone modifications can be stably maintained 
through cell divisions, even through genera- 
tions, making them good candidate substrates 
for a ‘cellular memory’ of mitochondrial dys- 
function. It will be interesting to determine 
whether H3K9me2 methyltransferases and 
H3K27me3 demethylases are required for 
maintenance of the mitochondrial-dysfunc- 
tion-induced transcriptional program and 
activation of the UPR™ through to adulthood, 
even when the mitochondrial stress that 
triggered it is removed. 

Because regulation of gene expression is 
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Figure 1 | Coordinated responses to mitochondrial stress. Gene expression can be inhibited by the 
tight packaging of DNA as chromatin around histone proteins that are decorated with methyl groups at 
specific amino-acid residues. Two such repressive modifications are dubbed H3K9me2 and H3K27me3. 
Two studies”® reveal that mitochondrial stress triggers the unpackaging and expression of genes 
involved in a stress response called the UPR™. Tian et al.° find that movement of the protein LIN-65 
from the cytoplasm to the nucleus triggers nuclear migration of the transcription factor DVE-1. The 
enzyme MET2 mediates the addition of H3K9me2 modifications to histones in the cytoplasm; these 
histones move to the nucleus, restricting the unpackaged regions where DVE-1 can bind to promote 
transcription. Merkwirth et al.° show that JMJD-1.2 and JMJD-3.1 enzymes are upregulated in response 
to mitochondrial stress and remove methyl groups from H3K27me3 to form the transcription-promoting 
modification H3K27mel in the vicinity of UPR™ genes. 


already known to be central to the response 
to mitochondrial stress®, the participation of 
histone modifiers is not in itself unexpected. 
However, the current studies suggest that 
chromatin-modifying enzymes can be highly 
selective, eliciting programs of gene expres- 
sion that are specific to a particular stress or 
pathway. Patterns of histone modifications 
can be extremely diverse, so it is not difficult 
to imagine how such exquisite specificity can 
arise. There are hundreds of histone modifica- 
tions and many enzymes that have distinct and 
overlapping roles in depositing and removing 
them. DNA is wound around a nucleosome 
complex composed of four different histones, 
and each histone can be modified at many 
positions, so the number of possible com- 
binations of histone modifications on each 
nucleosome is huge. Moreover, each nucleo- 
some can have distinct modifications from 
its neighbours, building up an astronomically 
complex ‘histone code’. 

Both groups found that the regulation 
of chromatin in response to mitochondrial 
stress is tissue specific. Tian et al. showed that 
LIN-65 activates the UPR™ pathway in the 
intestine, whereas Merkwirth et al. reported 
that JMJD-1.2 and JMJD-3.1 trigger the same 
response in neurons. An intriguing possibility 
is that distinct chromatin-regulating factors 
in different cells and tissues sense mitochon- 
drial stress and respond in divergent yet coor- 
dinated ways that best enable organisms to 
counter the stress. It will be crucial to elaborate 
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how stress signals can be coupled to chromatin 
regulation, and how cell-type-specific regula- 
tion can be coordinated. Perhaps mitochondrial 
impairment activates signalling cascades that 
regulate the expression, movements, modifica- 
tions and activities of distinct histone modifiers 
in specific cells. Mitochondrial impairment 
could also result in altered levels of metabolic 


intermediates, many of which act as essential 
co-factors of histone-modifying enzymes. 

The idea that a stress signal can engage 
specific histone modifiers to elicit a persistent 
response will no doubt extend to diverse stress- 
ors. Indeed, JMJD-3.1 has been implicated’ in 
the dampening of a stress response called the 
heat-shock response from early adulthood 
through to ageing in C. elegans. Additionally, 
by-products of cellular metabolism called free 
radicals modulate longevity in yeast through 
a histone-modifying enzyme’. We can look 
forward to many more exciting insights into 
the role of chromatin regulation in stress 
responses. 
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An extended yardstick 
for climate variability 


Decoded and precisely dated information encrypted in stalagmites from a cave 
in China reveal past climatic changes and provide insight into the complex 
interactions in today’s climate system. SEE LETTER P.640 


NELE MECKLER 


ave stalagmites represent a meticulous 
archive of climatic events stretching 
back over many hundreds of thousands 
of years and, unlike many geological deposits 
on land and in the ocean, allow the timing of 
these events to be precisely determined. On 
page 640 of this issue, Cheng et al.’ decipher 
a treasure trove of information concealed 
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in the stalagmites of the Sanbao Cave in the 
mountains of central China. Their findings 
extend one of the most valuable benchmark 
climate records available, which now spans 
an impressive 640,000 continuous years of 
Asian monsoon history with unprecedented 
dating precision. The Asian monsoon is a cru- 
cial component of the global climate system, 
so these data not only provide age constraints 
for other palaeoclimate time series, but also 


WILBUR E. GARRETT/NATL GEOGRAPHIC/GETTY 


Figure 1 | Stalagmites in Reed Flute Cave, Guilin, China. Mineral deposits such as stalagmites 
can be used as a record of climate. 


improve our understanding of the interactions, 
forcing factors and responses of today’s climate 
systems. 

As they grow upwards from the cave floor, 
stalagmites accumulate layer upon layer of 
calcium carbonate (Fig. 1). Deposited from 
oversaturated drip waters, this compound is 
imprinted with the water’s chemical compo- 
sition, which, in turn, carries the chemical 
signature of rainwater falling above the cave. 
The oxygen isotope composition of this water 
(heavy '°O compared with light '°O, reported 
as §'8O) is an indicator of the climate condi- 
tions at the time the rain fell (Fig. 2). Dissolved 
uranium leached from the rocks above the 
cave is also trapped in the forming calcium 
carbonate, acting as a radiometric clock as 
it slowly decays. This allows stalagmite age 
to be accurately determined back to around 
650,000 years ago. The older the material, 
the less exact this clock becomes, and the 
more effort is needed to determine precise age 
constraints. Cheng et al. go to the limit of what 
can be achieved, and their precisely dated 
stalagmite record covers more than six glacial- 
to-interglacial cycles. 

Climatic events are often globally inter- 
connected, sometimes to a surprising extent. 
For example, during glacial periods there is 
a remarkable synchrony between dry events 
in the Asian monsoon climate and cold 
events in the North Atlantic region: this is 
related to massive discharge of icebergs and 
changes in ocean and atmospheric circula- 
tion*“. The repercussions extend as far south 
as Antarctica, which warms during these 
events”®, and include alterations in the con- 
centrations of methane and carbon dioxide 


in the atmosphere. Such connections are well 
established for periods during which good 
age control is available for all archives, and 
can provide crucial lessons about our climate 
system. Under the (reasonably safe) assump- 
tion that the synchrony observed for the more 
recent past holds further back in history, the 
extended cave record described by Cheng et al. 
can be used as an indirect age constraint for 
other archives*” that lack direct radiometric 
age control. This information therefore greatly 
extends the time over which we have precisely 
dated climate data around the globe. 

It might be asked why long time series are 
important, given that it is so much easier to 
obtain well-dated climate records for the 
more recent past. The issue is the multitude of 
contributing factors that coincide at any one 
time. For example, Earth's climate is strongly 
influenced by the configuration of its orbit 
relative to the Sun, which changes cyclically. 
The time of the year at which Earth is closest to 
the Sun changes every 20,000 years or so (the 
precession cycle). Cyclical changes in the tilt of 
Earth’s axis (the obliquity cycle) occur with a 
periodicity of around 40,000 years, leading to 
additional variations in the strength of winter 
and summer seasons at high latitudes®. Which 
of these two cyclical variations is responsible 
for the observed duration of glacial cycles of 
roughly 100,000 years has long been a matter 
of debate”, 

The answer to this question can come only 
from a well-dated climate time series that 
covers a sufficient number of these cycles — 
like that amassed by Cheng and colleagues. 
Although this record reflects monsoon 
strength and not the extent of glaciation, 
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50 Years Ago 


Atlantic tunas will share with 

whales some immunity from 
hunting as a result of a convention 
on conservation drafted ... by 

the representatives of seventeen 
nations ... The convention has been 
prompted by evidence in recent years 
that more boats are chasing fewer 
fish. It applies to the whole of the 
Atlantic Ocean and adjacent seas, 
and the hope is that it will maintain 
the population of tuna ... at those 
levels which permit the greatest 
sustainable catch. The commission 
that will be responsible for 
administering the convention ... will 
be empowered to collect information 
and to recommend research 
programmes. It may also, on the 
basis of the evidence it collects, 
recommend limits on the rate of 
catch in the convention area. 

From Nature 25 June 1966 


100 Years Ago 


I refer to the continued proposals 

to found fresh scholarships for 

the encouragement of scientific 
research, accompanied as they so 
often are by statements as to the lack 
of trained men in science ... The 
men who have held ... scholarships 
for two or three years form a body 
highly trained in the best English 
and Continental universities ... Yet 
we see on all hands these men barely 
able to make a living ... They are in 
general men of all-round education, 
with specialised knowledge in 
science in addition; they are not 
particularly uncouth, unpractical, 
or unbalanced, as popular tradition 
would have men of science to be. 

It is this addition of specialised 
knowledge that ... is the greatest 
obstacle to their earning a living; 
they would probably be better paid 
if they turned their hand to any 
employment other than the pursuit 
of science. E.N.daC. Andrade, British 
Expeditionary Forces 

From Nature 29 June 1916 
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Limestone 
rock 


Figure 2 | Archiving of climate events. Cheng et al.' measure the relative proportions of the 

oxygen isotopes '°O and O in stalagmites to determine the intensity of past monsoons in central 
China. Rainwater acidified by soil carbon dioxide leaches calcium carbonate (CaCO,) and traces of 
radioactive uranium (U) from limestone rock above the cave. As it percolates into the cave, the water 
loses part of its CO,, causing calcium carbonate to precipitate out, forming stalactites and stalagmites 
on the roof and floor of the cave, respectively. This carbonate retains the rainwater’s original 
oxygen-isotope composition — an indicator of the climate conditions when the rain fell’? — and the 
dripwater’s uranium. As the uranium slowly decays to thorium (Th), it acts as a record of the time since 
calcite deposition. Cheng et al. use these processes to compile a remarkable time series of monsoon 


variability over 640,000 years. 


a previously established link between 
particularly strong dry events and glacial 
terminations allowed the authors to deter- 
mine the exact timing of the past seven glacial 
terminations. These all relate to specific times 
in the precession cycle and occur every four 
or five precession cycles, revealing a promi- 
nent role for this orbital cycle. 

In general, the extended cave record shows 


that the pattern of monsoon variability seen in 
the more recent past also occurred in earlier 
periods, indicating that it is an inherent feature 
of natural climate variability. Cyclical varia- 
tion in monsoon strength in relation to orbital 
cycles is particularly prominent. Furthermore, 
dry events of shorter duration are evident 
throughout the record. Cheng et al. investigate 
these dry periods by removing the orbitally 


Transmissible tumours 
under the sea 


In some species, cancer cells can be directly transmitted between individuals. 
An analysis in shellfish now shows that some transmissible cancers can even 
cross the species barrier. SEE LETTER P.705 


ELIZABETH P. MURCHISON 


n page 705, Metzger et al.' report the 
discovery that transmissible cancers 
are widespread in one group of marine 
shellfish, known as bivalves, and that such 
cancers can even jump between species. These 
findings suggest that cancer cells are common 
infectious agents in marine environments, and 
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challenge our understanding of the nature of 
cancer and its interaction with its hosts. 
Cancer occurs when a single cell in the body 
acquires genetic changes that drive inappro- 
priate cell proliferation. Once initiated, cancer 
evolves by natural selection, often producing 
cell lineages that spread through the host by 
a process called metastasis. However, can- 
cer does not normally spread beyond the 
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controlled cyclicity from their monsoon 
record and studying the remaining suborbital 
signals (termed Ad'*O). This approach could 
create artefacts, a possibility investigated by 
Cheng et al. (see Extended Data, Figs 5, 6), 
but allows the suborbital features to stand out 
more clearly in this residual time series — and 
again, these seem to be related to orbital varia- 
tions in the precession cycle. Ifit holds up, this 
could be an invaluable clue about the unknown 
cause of such short-lived events. 

The extended time series that Cheng et al. 
derive from the Sanbao Cave stalagmites has 
an array of other interesting features that call 
for further investigation. This wealth of new 
information on past climate variability over an 
exceptionally long time period and with pre- 
cise age control will bring us a step closer to 
understanding the drivers and interactions in 
our climate system. m 
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host’s body. Until now, such transmissible 
cancers — cancer-cell lineages with the 
potential to metastasize through an animal 
population — were considered to be exceed- 
ingly rare. Only four examples were known 
in nature: two affecting Tasmanian devils, 
one in dogs and another in soft-shell clams*™*. 
Metzger and co-workers now report four pre- 
viously unidentified transmissible cancers: one 
that affects mussels (Mytilus trossulus) found 
in British Columbia, one that affects golden 
carpet shell clams (Polititapes aureus) on the 
Iberian coast and two transmissible cancers 
of probably independent origin in common 
cockles (Cerastoderma edule). 

These cancers all cause a leukaemia-like 
disease in affected individuals called dissemi- 
nated neoplasia, which had previously been 
observed, and which manifests as an excess of 
large, abnormal cells in the circulatory system. 
Diseased animals have thick, opaque circula- 
tory fluid, and their tissues become clogged 
with invasive cancer cells**. The tendency for 
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Figure 1 | Cancer cells can be transmitted between shellfish species. A study by Metzger et al.' has found that a transmissible cancer occurring in golden carpet 
shell clams (Polititapes aureus) originated in a different species — the pullet shell clam (Venerupis corrugata). Although the two clam species share a habitat, the 
cancer is currently detected only in golden carpet shell clams, suggesting that pullet shell clams may have acquired resistance to infection by this cancer. 


many bivalve species to develop disseminated 
neoplasia has been known’ since the 1960s, but 
the underlying cause of the condition was not 
understood. 

Metzger and colleagues performed a genetic 
analysis of cancer and host tissues from 
several individual mussels, cockles and golden 
carpet shell clams. They found that, in many 
cases, the cancer cells bore no genetic simi- 
larity to their hosts, but instead were highly 
similar to cancerous tissues derived from 
other individuals of the same bivalve species. 
These findings confirmed that many cases 
of disseminated neoplasia in bivalve species 
are due to horizontal transfer of living cancer 
cells between hosts. 

A particularly unexpected finding of 
Metzger and colleagues’ work was that DNA 
extracted from cancer cells in golden carpet 
shell clams showed no genetic match with 
normal DNA from this species, but instead 
indicated that the cancer cells originated in 
a different species — the pullet shell clam 
(Venerupis corrugata). Surprisingly, however, 
pullet shell clams — which share a habitat with 
golden carpet shell clams — are not known 
to have a high prevalence of disseminated 
neoplasia. Perhaps the pullet shell clam has 
adapted to resist infection by the transmissible 
cancer that first arose ina member of its own 
species; despite this, the cancer has survived by 
engrafting to a new host species (Fig. 1). 

Altogether, these findings seem to paint a 
picture of shellfish beds around the world that 
are awash with microscopic cancer cells meta- 
stasizing both within and between species. 
Although the mechanisms of cancer transmis- 
sion remain unclear, the immobile nature of 
these filter-feeding invertebrates suggests that 
the cancer cells may float through the marine 
environment and enter their hosts by breach- 
ing the digestive or respiratory tracts. The 
mode by which cancer cells exit their diseased 
hosts is another puzzle. Perhaps this is a pas- 
sive process enabled by trauma or predation, 


or maybe cancer cells actively migrate out of 
the body by co-opting host signalling path- 
ways. Investigating the density and viability 
of free-living bivalve neoplastic — and non- 
neoplastic — cells in the external marine 
environment will be an interesting area for 
future study. 

Although disseminated neoplasia has 
been reported in many bivalve species, the 
current work and previous studies”® reveal 
that its prevalence varies greatly both within 
and between species. Variable prevalence of 
bivalve transmissible cancers, particularly 
within localized populations, hints at a fierce, 
ongoing pathogen-host evolutionary arms race 
beneath the sea. Although any mechanisms 
of host immunity to the cancers are unknown, 
their elucidation will provide insight into 
the diversity of cancer immunological and 
immune-evasion processes across species. 
Furthermore, it is not known how frequently 
new disseminated neoplasias arise in bivalves; 
identifying the genetic changes that distin- 
guish cancers that remain in one host from 
those that become transmissible may provide 
valuable information about the mechanisms 
of transmissibility. 

Determining the timescales and geographical 
distances that underpin the evolutionary his- 
tories of transmissible cancers in bivalves will 
provide a greater understanding of these dis- 
eases. It is possible that, like the canine trans- 
missible cancer*, these cancers are ancient cell 
lineages that have co-evolved with their hosts 
through the millennia; or perhaps their emer- 
gence is a relatively recent occurrence, possibly 
stimulated by infectious agents, environmental 
changes, aquaculture or other anthropogenic 
activities. 

The potential for cancer cells to become 
free-living infectious agents raises questions 
about the implications for cancer transmis- 
sion in humans. Although person-to-person 
transmission and survival of cancer cells has 
been reported during organ transplantation, 
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pregnancy, experimental treatments and 
surgical accidents, such exchanges are rare and 
never spread beyond transfer between two 
individuals*. Interestingly, however, the recent 
discovery of tapeworm neoplastic cells that 
spread within their severely immunocompro- 
mised human host’ consolidates Metzger and 
co-workers’ finding that cancers can invade 
new host species. 

The risk of cancer is inherent in multicellular 
organisms, and the basic evolutionary drive 
of this disease does not respect individual 
or even species barriers. Bivalve transmis- 
sible cancers provide a new model system 
in which to explore cancer transmission 
and host response. An understanding of 
the aetiology of disseminated neoplasias in 
these animals is also a boon for the aqua- 
culture industry, providing new opportuni- 
ties for disease biomonitoring and control. 
The discovery of widespread transmissible 
cancers under the sea is an exciting concep- 
tual advance, and opens up further avenues 
for cancer research. = 
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Paris Agreement climate proposals need 
a boost to keep warming well below 2°C 


Joeri Rogelj!*, Michel den Elzen?, Niklas Héhne**, Taryn Fransen®, Hanna Fekete’, Harald Winkler’, Roberto Schaeffer®, Fu Sha®, 


Keywan Riahi!!° & Malte Meinshausen!> 


The Paris climate agreement aims at holding global warming to well below 2 degrees Celsius and to “pursue efforts” to 
limit it to 1.5 degrees Celsius. To accomplish this, countries have submitted Intended Nationally Determined Contributions 
(INDCs) outlining their post- 2020 climate action. Here we assess the effect of current INDCs on reducing aggregate 
greenhouse gas emissions, its implications for achieving the temperature objective of the Paris climate agreement, and 
potential options for overachievement. The INDCs collectively lower greenhouse gas emissions compared to where current 
policies stand, but still imply a median warming of 2.6-3.1 degrees Celsius by 2100. More can be achieved, because the 
agreement stipulates that targets for reducing greenhouse gas emissions are strengthened over time, both in ambition and 
scope. Substantial enhancement or over-delivery on current INDCs by additional national, sub-national and non-state 
actions is required to maintain a reasonable chance of meeting the target of keeping warming well below 2 degrees Celsius. 


to combat climate change—was adopted under the United Nations 

Framework Convention on Climate Change (UNFCCC). In prepara- 
tion of this agreement, countries submitted national plans that spell out 
their intentions for addressing the climate change challenge after 20207. 
These Intended Nationally Determined Contributions (INDCs) address 
a range of issues, which can relate to avoiding, adapting or coping with 
climate change, among other things. Nevertheless, targets and actions 
for reducing greenhouse gas (GHG) emissions are core components. At 
this point, the INDCs are not final and can be modified up until the 
time the Paris Agreement is ratified. However, for now they represent 
our best understanding of the climate actions countries intend to pursue 
after 2020. 

The overarching climate goal of the Paris Agreement is to hold “the 
increase in the global average temperature to well below 2°C above pre- 
industrial levels and to pursue efforts to limit the temperature increase 
to 1.5°C above pre-industrial levels”’. This climate goal represents the 
level of climate change that governments agree would prevent danger- 
ous interference with the climate system, while ensuring sustainable food 
production and economic development*", and is the result of interna- 
tional discussions over multiple decades®. Limiting warming to any level 
implies that the total amount of carbon dioxide (CO;) that can ever be 
emitted into the atmosphere is finite®. From a geophysical perspective, 
global CO; emissions thus need to become net zero”*. About two thirds 
of the available budget for keeping warming to below 2°C have already 
been emitted?-"!, and increasing trends in CO emissions!” indicate that 
global emissions urgently need to start to decline so as to not foreclose 
the possibility of holding warming to well below 2°C (refs 13, 14). The 
window for limiting warming to below 1.5°C with high probability and 
without temporarily exceeding that level already seems to have closed’>. 
The Paris Agreement implicitly acknowledges these insights and aims to 
reach a global peak in GHG emissions as soon as possible together with 
achieving “a balance” between anthropogenic emissions and removals 
of GHGs in the second half of this century. Both targets are in principle 


] n December 2015, the Paris Agreement!—a new global agreement 


consistent with the temperature objective of the Agreement’®"’, but beg 
the broader question of whether current INDCs are already putting the 
world on a path towards achieving them. 

Besides the climate question, the first round of INDCs also raises 
many other issues. These include whether efforts are distributed equi- 
tably among countries; how much adaptation may be required given 
the current level of mitigation ambition; how ‘intended’ national pro- 
posals will be implemented; how they will be financed; and the extent 
to which the INDCs contribute to the achievement of other goals of the 
UNFCCC by building on institutions that can support adaptation to 
climate change, technology advancement, development path transfor- 
mation, sustainable development and enhanced awareness. Although 
these issues are important for many countries, they fall outside the scope 
of this analysis. 

In this Perspective, we assess the implications of the current INDCs 
for GHG emissions, including the main factors and uncertainties that 
influence the levels of GHG emissions in 2030—the latest year covered 
by the vast majority of INDCs—and we explore the consistency of these 
reductions with the objective of the Paris Agreement (to keep warming 
well below 2°C and pursue efforts towards 1.5°C). This work updates and 
expands work undertaken in the framework of the 2015 United Nations 
Environment Programme (UNEP) Emissions Gap Report!*—an author- 
itative annual assessment that has tracked climate policy action over the 
past six years, and provides a synthesis of a wide range of INDC model- 
ling studies!*-*? that are available in the public domain. The number of 
INDCs considered by the studies that we assess here ranges from the 118 
INDCs submitted by 1 October 2015 to the 160 INDCs submitted by 
12 December 2015 (Supplementary Tables 1 and 2). These 118 to 
160 INDCs cover emissions from 145 to 187 out of 195 Parties to the 
UNFCCC, which in turn were responsible for roughly 88% to more than 
96% of global GHG emissions in 2012. We also look at projections of 
global-mean temperature increase over the twenty-first century that 
would be consistent with the INDCs, and at post-2030 implications for 
limiting warming to no more than 2°C. Finally, we discuss options to 
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Netherlands. ®World Resources Institute, Washington DC, USA. 7Energy Research Center, University of Cape Town, Cape Town, South Africa. ®Universidade Federal do Rio de Janeiro (COPPE/UFRJ), 
Rio de Janeiro, Brazil. 9National Center for Climate Change Strategy and International Cooperation, Beijing, China. !°Graz University of Technology, Graz, Austria. !!Australian-German Climate and 
Energy College, School of Earth Sciences, The University of Melbourne, Melbourne, Victoria, Australia. *PRIMAP Group, Potsdam Institute for Climate Impact Research (PIK), Potsdam, Germany. 
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BOX | 
Scenario definitions 


scenarios, drawn from a wide variety of sources. 


scenarios from three global analyses!?.2978, 


as NDCs, without the ‘intended’. 


Scenarios represent alternative images of the future, or “[stories] about what happened in the future”’®. They are neither predictions nor 
forecasts’’, but tools to understand how the future might unfold under a consistent set of assumptions. In this analysis, we use four types of 


No-policy baseline scenarios. These are emissions projections that assume that no new climate policies have been put into place from 2005 
onwards. We select these scenarios from the IPCC AR5 Scenario Database!®, which is hosted at the International Institute for Applied Systems 
Analysis (IIASA, https://tntcat.iiasa.ac.at/AR5DB/). By design, these no-policy baseline scenarios exclude climate policies, but may include 
other policies that can influence emissions and are implemented for other reasons, such as energy efficiency or energy security policies. 


Current-policy scenarios. These consider the most recent estimates of global emissions and take into account implemented national policies. 
This is different from the INDC scenarios (described below), which reflect international pledges and intended policies. Here, we draw these 


INDC scenarios. These project how global GHG emissions evolve under a successfu 
on ten global INDC analyses!9*9 (Supplementary Table 2 provides an overview), in which calculations can be based on official estimates from 
countries or on documents submitted to the UNFCCC (such as national GHG inventories, national communications, biennial reports or biennial 
update reports). INDCs were submitted before the Paris summit; under the Paris Agreement, future mitigation contributions will be referred to 


2°C scenarios. These are idealized global scenarios limiting warming to wel 
temperature target to 1.5°C. These scenarios are based on a subset of scenarios from the IPCC AR5 Scenario Database (Supplementary Table 
3) that meet the following criteria: they have a greater than 66% probability of keeping warming to below 2°C by 2100 (this probability does 
not drop below 60% at any point during the entire twenty-first century); until 2020, they assume that actions that were pledged earlier under 
the UNFCCC Cancun Agreement are fully implemented; and, after 2020, they distribute emission reductions across regions, gases and sectors 
so that the total discounted costs of the necessary global reductions are minimized. These scenarios distribute emissions reductions among 
regions in the most cost-optimal way, and are often referred to as least-cost or cost-optimal trajectories. However, this does not imply that the 
actual costs to achieve this cannot be distributed differently, for example, on the basis of other equity principles’”%. 

A separate set of scenarios is used to examine the post-2030 implications of current INDCs for 2°C (Supplementary Table 4). 


All scenarios are expressed in terms of billions of tonnes of global annual CO2-equivalent emissions (Gt CO2-eq yr~1). CO2 equivalence of GHGs 
has been calculated by means of 100-year global warming potentials®°. 


implementation of the INDCs. These projections are based 


below 2°C, keeping open the option of strengthening the global 


further reduce global GHG emissions in 2030 from their INDC levels 
towards levels that are more consistent with a long-term global pathway 
that limits warming to well below 2°C. 

We use four scenario groups to frame the implications of the INDCs 
for global GHG emissions in 2030: no-policy baseline scenarios, cur- 
rent-policy scenarios, INDC scenarios and least-cost 2 °C scenarios. Their 
definitions and descriptions are provided in Box 1. 


Aggregate emissions impact of INDCs 
A first, obvious question to ask is what the submitted INDCs deliver in 
terms of GHG emissions out to 2030. What sounds like simple arith- 
metic turns out to be a more complicated accounting exercise with an 
array of possible outcomes. Some countries provide a range instead of 
a single number of emissions reductions in their INDCs. Many INDCs 
lack necessary details, such as clarity on sectors and gases covered, details 
on the impact of listed mitigation actions, different metrics to aggregate 
gases, details on base year or reference values from which reductions or 
improvements would be measured, or accounting practices related to land 
use and the use of specific market mechanisms*!. This murkiness compli- 
cates a precise estimate of their impact on emissions. Finally, some of the 
actions listed in INDCs are, either implicitly or explicitly, conditional on 
other factors, such as the availability of financial or technological support. 
All these factors can be interpreted differently and influence the range of 
possible outcomes. In our assessment, we distinguish between a condi- 
tional and an unconditional INDC scenario, with associated uncertain- 
ties. Interestingly, the Paris Agreement does not adopt such distinction, 
and instead defers any discussion on features of countries’ contributions 
to further negotiations. 

Unconditionally, the INDCs are expected to result in global GHG 
emissions of about 55 (52-58; 10th-90th percentile range over all studies 
unless otherwise stated) billion metric tonnes of annual CO)-equivalent 
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emissions (Gt CO-eq yr; Box 1, Fig. 1, Supplementary Text 1) in 2030. 
This is a reduction of around 9 (7-13) Gt CO2-eq yr‘ by 2030 rela- 
tive to the median no-policy baseline scenario estimate and of around 
4 (2-8) Gt CO2-eq yr relative to the median current-policy scenario 
estimate (Supplementary Table 5). Putting this into context, global 
GHG emissions in 2010 are estimated at about 48 Gt CO»-eq yr | (46- 
50 Gt COz-eq yrs range across studies, Supplementary Table 2), and 
our median no-policy baseline estimate reaches about 65 Gt CO-eq yr~! 
by 2030. 

A number of countries place conditions—for example, the provision 
of international finance—on all or part of their INDC. Some countries 
(such as Mexico, Indonesia and Morocco) included a range of reduction 
targets in their INDC and attach conditions to the implementation of 
the more ambitious end. Other countries indicate that their entire INDC 
is conditional. Of the INDCs submitted by 12 December 2015, roughly 
45% came with both conditional and unconditional components; 
about a third was conditional only; and the remainder did not specify 
conditions”. When we assume in our evaluation that all conditions are 
met and conditional INDCs are fully implemented, estimated global 
GHG emissions end up about 2.4 (1.2—4.8) Gt CO2-eq yr’ lower in 
2030 compared to the unconditional INDC scenario case (full range 
across six available estimates, Supplementary Text 1, Supplementary 
Table 5). 

Comparing the INDC scenario (what countries propose as their con- 
tribution to the international agreement) to the current-policy scenario 
(what countries implement domestically) provides lessons on the extent 
to which additional national policies are necessary to achieve the intended 
2030 emissions reductions'®. Projected emissions under current policies 
that match (or are lower than) those under the INDC can result either 
from a proactive and coordinated domestic policy response consistent 
with the INDC or from an INDC that is explicitly designed not to require 
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Figure 1 | Global greenhouse gas emissions as implied by INDCs 
compared to no-policy baseline, current-policy and 2°C scenarios. 
White lines show the median of each range. The white dashed line shows 
the median estimate of what the INDCs would deliver if all conditions 
are met. The 20th-80th-percentile ranges are shown for the no-policy 
baseline and 2°C scenarios. For current-policy and INDC scenarios, the 


further policy effort. Likewise, projected emissions under current policy 
that exceed those under the INDC can result from a relatively ambitious 
INDG, from a lack of domestic climate policy, or a combination thereof. 
Therefore, this comparison alone cannot adequately reflect the overall 
level of ambition. 

For a number of countries (such as Russia and Ukraine), the INDC 
targets suggest that emission levels above their estimated no-policy base- 
line or current-policy scenario will be reached. These countries are thus 
expected to overachieve their INDC targets by default. Under the rules 
of the Kyoto Protocol, over-delivery on a target would have generated 
surplus emission allowances by the quantity the target level is overa- 
chieved. These allowances can then be traded with other countries, who 
apply them to achieve their own GHG reduction target. Such a system 
could also be developed under the Paris Agreement, which allows for 
the voluntary use of “internationally transferred mitigation outcomes”. 
However, the extent to which such a mechanism will ultimately be devel- 
oped and used remains unclear, because it will require features, infor- 
mation and accounting of contributions to become much more precise 
than they are now. Different modelling teams treat these surpluses in 
different ways, which adds an uncertainty of about 1 Gt CO-eq yr‘ to 
the estimates presented here. 


Confounding factors 

The literature synthesized in this assessment reveals a wide range of esti- 
mates of future emissions under nominally similar scenarios (see small 
symbols in Fig. 1). These differences can stem from a number of factors, 
including modelling methods, input data and assumptions regarding 
country intent. Our review identifies four key factors that contribute to 
the discrepancies and differences between the various 2030 emissions 
estimates. 


Incomplete coverage 

Several global and national sectors as well as countries are not covered 
by INDCs. Often, emissions estimates for sectors that are not included 
under INDCs range widely. This is the case for, for example, global 
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minimum-maximum and 10th-90th-percentile range across all assessed 
studies are given, respectively. Symbols represent single studies, and are 
offset slightly to increase readability. Dashed brown lines connect data 
points for each study. References to all assessed studies are provided in 
Box 1. Scenarios are also described in Box 1. 


emissions from international aviation (despite an industry pledge out- 
side the UNFCCC?) and maritime transport, or the national non-CO; 
GHG emissions from China. Subtracting national sectors that are not 
covered, INDCs cover at least 8 percentage points less of global emis- 
sions than the 96% indicated earlier (Supplementary Text 2). Under the 
Paris Agreement, developing countries are encouraged to move over time 
to economy-wide targets, so that future analyses should become more 
comprehensive. Countries that are not a UNFCCC Party or have not yet 
put forward an INDC are also studied in less depth, but represent only a 
diminishing amount of global emissions (about 1%-2%). Finally, studies 
themselves make specific choices about which INDCs to cover or focus 
on, which in turn influence projected emissions. 


Uncertain projections 

GHG emission projections of countries that have submitted INDCs are 
uncertain, particularly if targets are not unambiguously translatable in 
absolute emission reductions. Most INDCs do define straight-forward, 
absolute GHG emission targets (in units of CO -eq in a given year or 
period), or targets that can be relatively easily translated into absolute 
levels (for example, a reduction from a fixed historical base year), but this 
is not always the case. About 75 INDCs are defined relative to hypothetical 
‘business-as-usual’ or reference scenarios in the absence of climate pol- 
icy”. In some cases governments do not define their reference scenario, 
and in other cases official projections differ substantially from those from 
international and national modelling teams. Overall, these uncertainties 
should become smaller, because the Paris decisions request countries to 
ensure some methodological consistency of future submissions. Another 
complicating factor is that several countries put forward targets that do 
not directly specify emissions (such as a renewable energy target) or 
targets on emissions intensity (for instance, improvements of the ratio 
of carbon emissions, CO2, to economic output, GDP). If the expected 
GDP growth rate is not provided, additional assumptions are required to 
quantify the implied absolute level of GHG emissions and these assump- 
tions differ across modelling groups. For example, the estimated emis- 
sions for China for 2030 under its INDC range from 12.8 Gt CO3-eq yr! 
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Table 1 | Estimates of global temperature rise for INDC and other scenarios categories 


Scenario Global-mean temperature rise by 2100 (in °C) that is not exceeded with the given probability 

50% 66% 90% 
No-policy baseline 4.1 (3.5-4.5) [3.1-4.8] 4.5 (3.9-5.1) [3.4-5.4] 5.6 (4.8-6.3) [4.2-6.8] 
Current policy 3.2 (3.1-3.4) [2.7-3.8] 3.6 (3.4-3.7) [2.9-4.1] 44 (4.2-4.6) [3.6-5.2] 


INDC (unconditional) 
INDC (conditional) 


2.9 (2.6-3.1) [2.2-3.5] 
2.7 (2.5-2.9) [2.1-3.2] 


3.2 (2.9-3.4) [2.4-3.8] 
3.0 (2.7-3.1) [2.2-3.6] 


3.9 (3.5-4.2) [2.8-4.7] 
3.7 (3.3-3.9) [2.6-4.4] 


For each scenario, temperature values at the 50%, 66% and 90% probability levels are provided for the median emission estimates, as well as the 10th-90th-percentile range of emissions estimates 
(in parentheses) and the same estimates when also including scenario projection uncertainty (in brackets). Temperature increases are relative to pre-industrial levels (1850-1900), and are derived 
from simulations with a probabilistic set-up with the simple model MAGICC (refs 10, 68-70, Supplementary Text 3). 


to 15.0 Gt CO>-eq yr~! in different studies*”***°. At least seven other 
INDCs, including India’s, are subject to the same kind of uncertainties 
(Supplementary Table 1). Finally, many countries (about 30, amount- 
ing to approximately 6% of global emissions) include mere qualitative 
descriptions of mitigation actions in their INDCs, which complicate a 
precise quantification. 


Land-use-related emissions 

Various approaches exist to account for emissions from land use, land- 
use change and forestry, and countries can use an accounting approach 
of their choice in their INDCs. Examples of possible approaches are 
to include land-use-related CO, emissions and removals as part of the 
national total, much like any other sector (an approach favoured by, 
for example, Brazil and USA), or to apply accounting rules similar to 
the ones under the Kyoto Protocol (which are favoured by, for example, 
the European Union and New Zealand, and possibly by Russia). These 
accounting rules can have a substantial effect on the emissions of indi- 
vidual countries in 2025 and 2030*6 and are associated with substantial 
uncertainties. Although some INDCs explicitly exclude land-use-related 
emissions from their targets, many INDCs that include land use in their 
targets do not specify an accounting approach. 


Historical emissions and metrics 

Historical emission estimates come with their associated uncertainties. 
For example, recently, global 2010 GHG emissions have been estimated 
at 49 Gt COz-eq yr! (44.5 Gt CO2-eq yr |, 90% confidence interval)*”. 
Model teams apply their own estimate of historical emissions in their 
INDC analyses (Fig. 1), and both INDCs and analysts use varying met- 
rics to translate GHG emissions into units of CO2-equivalence. Even if 
these discrepancies can be harmonized” or corrected for, their variation 
increases the uncertainty surrounding INDC estimates. 


Optimal 2°C pathways 

Having quantified the GHG implications of the INDCs by 2030, the 
question remains whether these levels are consistent with the Paris 
Agreement’s aim of holding warming to well below 2°C. As indicated 
earlier, limiting warming to any level requires net CO, emissions to 
become zero at some point in time and, given the small remaining car- 
bon budget, this moment is estimated to be before the end of this cen- 
tury for a2°C limit'!'”. The Paris Agreement’s aim of reaching net-zero 
GHG emissions in the second half of the century goes even further. For 
some non-CO; emissions, in particular those related to agriculture, only 
limited mitigation options have been identified’. Therefore, net-zero 
CO, emissions are always achieved before achieving net-zero GHG emis- 
sions. Integrated energy-economy models are used extensively to model 
pathways that can achieve this feat at global least cost!®. Here, we use the 
Scenario Database that accompanied the Fifth Assessment Report (AR5) 
of the Intergovernmental Panel on Climate Chang (IPCC) to explore such 
cost-optimal 2°C pathways from 2020 onward (Box 1). 

Comparing these cost-optimal 2°C scenarios to the INDC projections 
shows a large discrepancy (Fig. 1). The median cost-optimal path towards 
keeping warming to below 2°C (starting reductions in 2020) and the 
emissions currently implied by the unconditional INDCs differ by about 
14 (10-16) Gt CO2-eq yr! in 2030. Even if the conditions that are linked 
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to some INDCs are met (see earlier), this difference remains of the order 
of 11 Gt CO2-eq yr~!. The high end of this range (16 Gt CO2-eq yr‘) cor- 
responds roughly to the 2010 emissions of China and USA combined; the 
lower end (about 10.5 Gt CO-eq yr’) to the sum of the emissions of Brazil, 
the European Union, India and Russia. Thus, the INDCs clearly do not put 
the world ona least-cost path towards limiting warming to well below 2°C. 

Any global emission scenario reflects an idealized representation of the 
world. This is not different for the cost-optimal 2°C scenarios that were 
used above as a reference. The strength of such cost-optimal scenarios lies 
in the fact that they provide an assessment of the potential for emission 
reductions in a world that collaborates globally towards limiting climate 
change and attempts to do this at lowest overall cost. Other scenarios in 
the literature model other, more imperfect futures, for example, those 
in which climate action is delayed by a few decades'*!*4!, in which 
countries and regions are not collaborating from the beginning”, or in 
which the strength of local institutions affects the willingness to invest®*. 
Such scenarios help us to explore the post-2030 implications of the cur- 
rent INDCs. 


Post-2030 implications of INDCs 

A large share of the potential warming until 2100 is determined not just 
by the INDCs until 2025 or 2030, but also by what happens afterwards. 
Several conceptual approaches can be followed to extend INDCs into 
the future, which basically assume that climate action stalls, continues or 
accelerates. Stalling action is often modelled by assuming that emissions 
return to a no-climate-policy trajectory after 2030; continuing action by 
assuming that the level of post-2030 action is similar to pre-2030 action 
on the basis of a metric of choice (for example, extrapolating INDC trends 
in terms of carbon-price development or emissions intensity of the econ- 
omy); and accelerating action by post-2030 action that goes beyond such 
a level. Because of the path-dependence and inertia of the global energy 
system****, the INDCs have a critical role in preparing what can come 
after 2030. 

Each of the above-mentioned approaches leads to different global 
temperature outcomes, even when starting from the same INDC assess- 
ment for 2030. It is therefore essential to spell out post-2030 assumptions 
to understand global temperature projections for the twenty-first cen- 
tury based on the INDCs. As a conservative interpretation of the Paris 
Agreement, we here assume that climate action continues after 2030 at 
a level of ambition that is similar to that of the INDCs (Supplementary 
Text 3). The assumption that climate action will continue or accelerate 
over time is supported by the Agreement’s requirement that the successive 
nationally determined contribution of each country must represent a pro- 
gression beyond the earlier contributions, and reflect the highest possible 
ambition, of that country. Stalling climate action after 2030 would be in 
contradiction with the provisions of the Paris Agreement. 

Under these assumptions of continued climate action, the 2030 
unconditional-INDC emission range is roughly consistent with a 
median warming relative to pre-industrial levels of 2.6-3.1°C (median, 
2.9°C; full scenario projection uncertainty, 2.2-3.5 °C; Table 1, Fig. 2b, 
Supplementary Fig. 1), with warming continuing its increase afterwards. 
This is an improvement on the current-policy and no-policy baseline sce- 
narios, whose median projections suggest about 3.2°C and more than 4°C 
of temperature rise by 2100, respectively. The successful implementation 
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Figure 2 | Temperature implications of current INDCs. a, GHG emission 
ranges (20th-80th percentile) of scenarios from the IPCC ARS Scenario 
Database with constant policy assumptions from 2010 onwards (blue- 
to-green shaded ranges), grouped per estimated median global-mean 
temperature increase in 2100 relative to pre-industrial levels (1850-1900), 
and range of the scenario subset limiting warming to below 2°C by 2100 
with 50%-66% likelihood (dark orange) from year-2030 INDC levels. 
The vertical orange lines show the unconditional INDC range in 2025 
and 2030, as shown in Fig. 1. The 2°C range shown in Fig. 1 starts global 
least-cost mitigation action in 2020 instead of 2010 and is not included 
here. b, Relationship between global GHG emission levels in 2030 and 
median global-mean temperature increase by 2100 based on scenarios 
shown in a. Each dot represents a single scenario. The blue line shows a 
smoothing spline fit (R? + 0.93) and the blue-shaded area shows fits to the 
5th and 95th percentile over all points. Comparing the central fit with the 
range of year-2030 GHG emissions implied by the unconditional INDCs 
shows that INDCs are roughly consistent with a median warming of 


of all conditional INDCs would decrease our median estimate by an 
additional 0.2 °C, but keeps the outcome far from the world the Paris 
Agreement is aiming for, with well-below 2°C and 1.5°C of warming. 
Moreover, all above-mentioned values represent median projections. 
Because the climate response to GHG emissions remains uncertain*®, 
it is also possible that substantially higher temperatures will materialize 
with compelling likelihoods (Table 1). For example, at the 66th percentile 
level, warming under the unconditional INDCs is projected to be about 
0.3°C higher (3.2 °C, with a range of 2.9-3.4°C). Finally, the INDC cases 
that we assess here will exceed the available carbon budget for keeping 
warming to below 2°C by 2030 with 66% probability (that is, roughly 


2.6-3.1°C by 2100 (horizontal dark-orange range), and a 2.2-3.5°C range 
including scenario projection uncertainty (horizontal light-orange range). 
Vertical dashed lines and shaded regions show year-2030 GHG estimates 
for the various scenario sets. c, Annual CO, reduction rates modelled in 
scenarios limiting warming to below 2°C from year-2030 INDC levels 
(dark-orange range in a; bars, median; vertical lines, spread across all 
available scenarios) and historical examples (range for France, Sweden 
and Denmark is based on ref. 74; see Supplementary Text 4). d, Implied 
cumulative carbon emissions including uncertainties, and comparison to 
budget ranges for not exceeding 1.5°C (with 50% probability) and 2°C 
(with 66% probability) from refs 9 and 11 (dark bar, lower estimate; light 
bar, high-range estimate). Historical estimates are from ref. 75. Vertical 
lines show the range due to scenario spread (Supplementary Text 3 and 
Supplementary Table 6). Arrows and bars in the first four columns show 
the projected cumulative CO, emissions until 2030 for each respective 
scenario. 


750-800 Gt CO, implied emissions under the INDCs during the 2011- 
2030 period compared to the 750-1,400 Gt CO? available; Supplementary 
Text 3, Supplementary Table 6, Fig. 2d). The budget for never exceeding 
1.5°C with a 50% probability (550-600 Gt CO>) will be entirely gone, indi- 
cating that active removal of CO; at a later point in time will be required 
to return to within this budget (Supplementary Table 6). Median warm- 
ing under the INDCs is projected to cross the 1.5°C and 2°C limits by 
2030-2045 and 2045-2075, respectively (Supplementary Fig. 4). 

The question thus arises whether global temperature rise can be kept 
to well below 2°C with accelerated action after 2030. Global scenarios 
that aim to keep warming to below 2°C and that achieve this objective 
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from 2030 GHG emissions similar to those from the INDC range have 
been assessed in detail by recent large-scale model-comparison pro- 
jects'©“°. Our re-analysis of these scenarios shows that even with accel- 
erated action after 2030, options to keep warming to well below 2 °C from 
current INDCs are severely limited, particularly if some key mitigation 
technologies do not scale up as anticipated. This is easy to understand 
if one appreciates that even if all INDCs are successfully implemented 
by 2030, the 2°C carbon budget might already be virtually exhausted by 
that time (see earlier and Fig. 2d). The Paris Agreement does not define 
precisely what its “well-below 2°C” aim means. Typically, policymakers in 
the UNFCCC have been concerned about limiting warming to below 2°C 
with >66% probability”. However, from current INDC levels, all available 
internally consistent scenarios manage to limit warming to below 2°C 
with only a lower, 50%-66% probability, increasing the risks of climate 
change impacts. No scenarios are available that are consistent with both 
the current INDCs and a 1.5°C warming limit with 50% probability. 

The available scenarios show rapidly declining emissions after 2030, 
with global CO, emissions from energy- and industry-related sources 
reaching net-zero levels between 2060 and 2080. The global economy is 
thus assumed to fully decarbonize’” in the time span of three to five dec- 
ades and from 2030 levels that are higher than today’s. Furthermore, about 
two thirds of these scenarios achieve a balance of global GHG emissions 
(as mentioned in the Paris Agreement) between 2080 and 2100. Because 
some non-CO) emissions are virtually impossible to eliminate entirely 
(for example, those from specific agricultural sources*”), reaching such 
a balance will involve net-negative CO, emissions‘ at a global scale to 
compensate for any residual non-CO, emissions”’, resulting in a gradual 
decline in global-average temperatures over time. Technologies that might 
be able to achieve this feat are still surrounded by important uncertain- 
ties (see below). In general, lower near-term emissions allow for a later 
timing of reaching global net-zero CO emissions’ (see 1.5-2°C versus 
dark-orange range in Fig. 2a) and, moreover, reduce the overall future 
reliance on negative emissions technologies'®*0"", 

To illustrate the challenges involved, we take a critical look at some 
characteristics of the scenarios. Scenarios that broadly follow the INDCs 
until 2030 and still manage to keep warming to below 2°C (with 50%- 
66% probability only) are associated with a very rapid decline in CO2 
emissions from energy- and industry-related sources after 2030. The 
decarbonization between 2030 and 2050 is particularly decisive in these 
scenarios*®*». For this period, the scenarios show average rates of decline 
in annual emissions of about 3.5% (2.0%-4.2%, full range across scenar- 
ios; Supplementary Text 4). To understand what this means ina historical 
context, it makes sense to distinguish between (1) the phase-out of CO, 
generation over time (a proxy for the reduction in fossil-fuel use and 
upscaling of low-carbon energy sources) and (2) the required upscaling 
of industrial-scale CO sequestration with carbon capture and geological 
storage (CCS) technologies*”°. The latter mitigation option has not been 
applied in the past. It can thus be seen as an additional technological 
option that is included in scenarios, but that did not contribute to the 
past experiences. 

In the 2°C scenarios that start from INDC levels in 2030 (dark-orange 
range in Fig. 2a), CO2 generation is reduced at a median annual rate of 
about 2.3% (0.0%-3.3%, full range) between 2030 and 2050. Historically, 
countries have been able to achieve reductions in CO} generation at rates 
of about 2%-3% per year asa result of dedicated (energy-security) pol- 
icies“° (Fig. 2c; Supplementary Text 4). Limiting warming to below 2°C 
from year-2030 INDC levels thus implies that the pace of such a precip- 
itated phase-out of fossil-fuel use needs to be replicated globally. These 
historical reductions were all achieved for non-climate reasons, with a 
focus on energy security and not on emissions reductions. There is thus 
no clear historical analogue for reductions under a dedicated and strin- 
gent climate policy. The challenge remains nevertheless important. This 
becomes even clearer when appreciating that all historical analogues for 
reductions were achieved in highly developed countries, such as France, 
Sweden and Denmark. Achieving similar results in developing coun- 
tries, with energy-intensive sectors that are still growing and with weaker 
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institutional frameworks, higher investment risks and less capacity, 
will be more difficult*, but, at the same time, readily available low-cost 
zero-carbon alternatives could also allow those economies to leap-frog 
carbon-intensive development in some sectors. 

Scenarios complement the global phase-out of CO2 generation with 
a scale-up of CCS infrastructure to capture and geologically store part 
of the CO) that continues to be generated. This scale-up is massive in 
scenarios that limit warming to below 2°C from INDC levels. Because 
such scenarios have limited CCS deployment until 2030, the annual rate 
of CO, sequestration is assumed to increase 10- to >100-fold in the 2030- 
2050 period, reaching about 10 Gt CO yr? in 2050 (8-14 Gt CO, yr“! 
range). To put this challenge into perspective, about 85 GW (meas- 
ured in coal-equivalent power generation; Supplementary Text 4) 
of new CCS capacity would need to be installed each year to capture 
this amount by 2050, which corresponds roughly to the combined 
capacity of solar and wind power generation that is annually glob- 
ally installed today*!? (Supplementary Fig. 2). Altogether, the global 
energy-system transition that is required to limit warming to well below 
2°C and further to 1.5°C is unprecedented. 

Finally, scenarios often combine CCS with biomass energy (abbreviated 
as BECCS) as a way of actively capturing and removing COz from the 
atmosphere. Although in principle this is technically possible, deploy- 
ment of such technologies at scale is untested, and could be controver- 
sial because of public acceptance™ or because of their competition with 
food production over land and water?>>*, A recent review™! showed that, 
assuming agricultural practices and yields do not change over the twen- 
ty-first century, removing CO; could require large amounts of land. At the 
same time, other assessments*”°>->” concluded that it might be possible 
to produce the required amount of bio-energy in a sustainable way (up 
to 300 EJ yr~1, see box 11.5 in ref. 39). The importance of the land-use 
question for policy is highlighted by the decision of the IPCC to dedicate 
one of its three upcoming Special Reports to questions of sustainable land 
management and food security. Exploring futures in which a global bal- 
ance of GHG emissions can be achieved in the second half of this century 
with technically feasible and societally acceptable technologies represents 
a major research challenge emerging from the Paris Agreement”. This 
challenge is particularly relevant to policy, because limiting emissions in 
2030 does not only increase the chances of attaining the 2°C target, but 
also reduces the need to rely on unproven, potentially risky or controver- 
sial technologies in the future’®*°°?, 


Decreasing the post-2030 challenge 

The post-2030 challenge to limit warming to below 2°C from current 
INDC levels is daunting, and pursuing efforts for 1.5°C even more so. 
However, the overall challenge can be minimized by additional GHG 
reductions in the near-term!®**!, In this context, near-term means 
before and by 2030. Besides (i) the option of countries increasing the 
overall ambition of their INDCs, we identify several other options 
that can contribute to this (see Table 2 for an overview). The options 
include: (ii) increasing the coverage of INDCs to more sectors and gases; 
(iii) including international sectors such as aviation and international 
maritime transport; (iv) implementing measures that enable over- 
delivery on the INDCs; (v) increasing contributions to international 
climate finance and international cooperation on technology develop- 
ment, transfer and diffusion; and (vi) promoting and implementing addi- 
tional national, sub-national and non-state initiatives. These options are 
not fully additional; some of them overlap (strongly) with the INDCs, 
and their precise contributions thus remain speculative (see Table 2). 
However, several indications suggest that such an increase in ambition 
is possible. 

First, increasing ambition over time is a key component of the Paris 
Agreement framework. For example, countries are requested to submit 
new—or update existing—contributions that should represent a pro- 
gression beyond their earlier commitments. The certainty of the new 
global climate agreement, together with the improving cost and avail- 
ability of low-carbon technologies”, might help countries to consider 
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Table 2 | Overview of options to reduce post-2030 challenge 


Option 


Description 


Possible impact on global emissions in 2030 


(order of magnitude) 


(i) Increasing ambition of existing 
2025 and 2030 contributions 


(ii) Increasing coverage of sectors 
and gases 


(iii) Including international sectors 


(iv) Implementing domestic 


The outcome of the Paris climate summit provides several opportunities to 
increase ambition of national contributions by 2030, for example, through 
consecutive five-year cycles during which national contributions increase in 
ambition. 


Some countries cover only part of their total GHG emissions and some sectors 
in their contributions; for example, some contributions apply only to COz and 
not to other GHGs. Extending INDCs to all sectors and gases would increase the 
global coverage of INDCs. 


At present, the contributions cover only countries. International sectors, such as 
international aviation and maritime transport can also be included. These sectors 
covered around 2% of global emissions in 2010#, with an increasing trend. 


Countries can implement domestic measures that go beyond the actions 


10 Gt CO>z-eq yr! 
Theoretical potential to embark ona 
least-cost 2°C pathway after 2020!°. 


0.1-1 Gt COz-eq yr-tt 


0.1-1 Gt CO2-eq yr~! (ref. 71) 


10 Gt COz-eq yr! 


PERSPECTIVE | RESEARCH | 


measures that enable described in the current INDCs. 


over-delivery on the INDCs* 


(v) Increasing climate finance and 
international cooperation® 


(vi) Implementing international 
cooperative initiatives” 


Additional international climate finance and cooperation on technology 
development, transfer and diffusion could help to (over-)achieve the 
conditional end of the national contributions. 


Action could be implemented by ambitious sub-national or regional 
governments, companies, organizations, non-governmental organisations and 
citizens to further reduce emissions. The amount of overlap of these initiatives 
with national contributions remains unclear. 


Theoretical potential to embark ona 
least-cost 2°C pathway!®72., 


1 Gt COz-eq yr“! 

Estimate for moving from unconditional 
to conditional INDCs8. 

No estimate available for additional 
reductions. 


1 Gt CO>-eq yr! in 20201! 


No comprehensive estimates available 
for 2030. 


Further reducing global GHG emissions by 2030 reduces the post-2030 challenge for limiting global-mean temperature increase to below 2°C relative to pre-industrial levels. This table provides an 
overview of options to improve pre-2030 emissions reductions. These options can overlap to smaller or larger degrees with the nationally determined contributions indicated in option (i); consequently, 
they are not fully additive. Estimates of the potential impact on global emissions are highly uncertain and therefore only the expected order of magnitude (nearest power of 10) is provided for each 


option. 
“Options that overlap to smaller or larger degrees with option (i). 
+See Supplementary Text 2. 


#Source: Emission Database for Global Atmospheric Research (EDGAR), http://edgar.jrc.ec.europa.eu/. 


§See Supplementary Table 5. 
llEstimate based on refs 64, 65 and 73. 


strengthening their post-2020 contributions. Second, countries can 
undertake further domestic measures. Because many countries have 
undergone national stakeholder processes in preparation of their INDCs, 
they could now be ina better position to consider additional policies. 
Sub-national actors such as cities and regional governments may take 
further action, and non-state actors can also help to overachieve INDCs. 
The Paris conference saw unprecedented willingness to act by these stake- 
holders, with more than 1,000 non-state actors signing the Paris Pledge 
for Action (http://www.parispledgeforaction.org/), signalling that they are 
willing to support efforts to meet and exceed the ambition of governments 
for keeping the world on a 2°C trajectory. This role of non-Party stake- 
holders is acknowledged more clearly than ever before in the official Paris 
decisions. However, although the theoretical potential of these activities 
is huge, their additional impact is very hard to quantify, and it remains 
unclear whether these initiatives are additional to the already pledged 
national contributions. 


Outlook 

Covering most of the world’s GHG emissions with climate plans in the 
form of voluntarily submitted INDCs is a historic achievement. The 
Paris Agreement requires the submission of successive, increasingly 
ambitious, nationally determined contributions that are subject to strong 
transparency guidelines, as well as a global stock-take, in the light of 
equity and science, every five years. The optimism accompanying this 
process has to be carefully balanced against the important challenges that 
current INDCs imply for post-2030 emissions reductions. Even starting 
today, limiting warming to no more than 2°C relative to pre-industrial 
levels constitutes a societal challenge; at the same time, the warming 
projected from current INDCs constitutes an important challenge on 
its own in terms of coping with climate impacts. The nationally deter- 
mined contributions constitute a new era for climate policy under the 
Paris Agreement, and represent both an invitation and a call for further 
action. Furthering greater reductions in the coming decade and preparing 
for a global transformation of development pathways is critical. 


Two developments look particularly promising to us. First, it becomes 
increasingly clear to decision-makers that measures to reduce GHG 
emissions have multiple socio-economic benefits®°. The action by vir- 
tually all countries improves prospects for further collective action, 
which must be the fundamental basis of any adequate response to climate 
change. Therefore, it becomes easier to conceive additional measures or 
strengthen existing ones. Second, the recent unprecedented engagement 
of non-state actors such as businesses, citizens and religious organiza- 
tions illustrates a more profuse awareness and an increased momentum 
for climate action. Given the large potential for emissions reductions 
as a result of both of these options, supporting and enabling national 
and non-state action will be critical. This insight also opens important 
avenues for future research and assessment. The research community 
will have to break from a one-sided climate-policy-centred approach 
and develop new concepts and frameworks that further the achievement 
of a portfolio of societal objectives, including climate, food and energy 
security, public health, and other goals of the sustainable development 
agenda®’. Charting development pathways that can hold warming well 
below 2°C will thus require a renewed effort of the social and physical 
science communities alike. 
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Oxygen isotope records from Chinese caves characterize changes in both the Asian monsoon and global climate. Here, 
using our new speleothem data, we extend the Chinese record to cover the full uranium/thorium dating range, that is, the 
past 640,000 years. The record’s length and temporal precision allow us to test the idea that insolation changes caused by 
the Earth’s precession drove the terminations of each of the last seven ice ages as well as the millennia-long intervals of 
reduced monsoon rainfall associated with each of the terminations. On the basis of our record’s timing, the terminations 
are separated by four or five precession cycles, supporting the idea that the ‘100,000-year’ ice age cycle is an average of 
discrete numbers of precession cycles. Furthermore, the suborbital component of monsoon rainfall variability exhibits 
power in both the precession and obliquity bands, and is nearly in anti-phase with summer boreal insolation. These 
observations indicate that insolation, in part, sets the pace of the occurrence of millennial-scale events, including those 


associated with terminations and ‘unfinished terminations’. 


The seasonal cycle of solar heating over Asia gives rise to the Asian 
monsoon (AM), a vast system of overturning atmospheric circulation 
that transports heat and moisture during boreal summer across the 
Indian Ocean and the tropical western Pacific into the Indian subcon- 
tinent and southeastern Asia, and as far as northern China and Japan! 
(Extended Data Fig. 1). Cave climate records have been important in 
characterizing AM changes and their causes. Such records demonstrate 
large and, in many cases, abrupt changes in monsoon intensity, inferred 
to have affected large swaths of Asia‘ +. A hallmark of these records is 
the precision with which age can be determined with modern U-Th 
dating methods’, thus allowing direct comparison with the orbital 
cycles without requiring orbital tuning’*. This approach, however, has 
been hindered by limited temporal coverage. Here we report a record 
from China, which, together with previously published data, covers 
the complete U-Th dating range from 640,000 years (640 kyr) ago to 
the present. Previous Chinese cave studies have demonstrated a close 
correspondence between changes in the AM and shifts in Northern 
Hemisphere summer insolation (NHSI) on orbital timescales** and a 
close relationship between the AM and climate in the North Atlantic 
region on millennial scales'~*”*. The latter has been used to correlate 
monsoon records with records from the North Atlantic region’**, Of 
note are Heinrich stadials (HSs) or ice rafted debris (IRD) events of 
North Atlantic origin, some of which coincide with Weak Monsoon 
Intervals (WMIs) in China**. This correlation has been used to transfer 
the cave chronology to the marine oxygen isotope record, a strategy that 
has been important in establishing the timing of ice age terminations* 
and which we apply here. 

Our new 6!80 data from Sanbao Cave, China, allow us to establish 
the timing of Terminations (T) V through to VII in addition to the pre- 
viously determined timing of terminations back to T-IV (ref. 4) First, 
this allows us to test ideas about the 100-kyr pacing of late Pleistocene 
ice age cycles” and the degree to which termination timing is consistent 
with obliquity and/or precession forcing!°"'®. Second, after removing 
the component of the AM that correlates with insolation, we examine 


the residual suborbital variation’ over the full record. We show that 
some aspects of millennial-scale variability relate to orbital geometry. 
Third, the full record now crosses the Mid-Brunhes Event (MBE), when 
the character of CO2 and ice volume cycles changed'”~*°. We assess the 
degree to which these changes affected the AM. Fourth, we estimate the 
timing and duration of maximal AM strength over the Marine Isotope 
Stage (MIS) 11, a period of time which can be used as an analogue to 
the Holocene and future climate because of similar orbital geometry". 


Samples and results 

Sanbao Cave is on the northern slope of Mt Shennongjia in central China 
(110° 26’ E, 31° 40’ N, elevation 1,900 metres above sea level). Mean 
annual temperature is 8°C, and mean annual precipitation is 1,950mm, 
80% of which occurs during the summer (June to August). Four new 
stalagmites were collected ~1,500m from the cave entrance. Samples 
were dated by a recently improved **°Th dating technique’, yielding 
precise age control (for example, +1.5 kyr at the time of T-V) (Extended 
Data Figs 2, 3 and Supplementary Table 1). New 6'8O measurements 
have a temporal resolution of between 200 and 70 years (average 
of ~120 years) (Extended Data Fig. 4 and Supplementary Table 1). The 
replication test (Extended Data Figs 2, 4) and other lines of reasoning!“ 
suggest that speleothem 5/80 variability results from changes in the 6 
180 of precipitation. 

The climate interpretation of changes in the cave 6'8O records from 
China remains a subject of intense debate!. However, most studies sup- 
port one or both of the ideas presented in the original studies. Yuan 
et al.”* invoked Rayleigh fractionation to show that changes in the frac- 
tion of water vapour rained out between tropical sources and the cave 
site could account for the observed variability in the cave records. Most 
modelling studies (Liu et al.”* and references therein) support this idea, 
although most refer to the process as ‘upstream depletion. Cheng et al.* 
proposed that changes in the fraction of low §'8O monsoon rainfall in 
annual totals could also explain the record. Recent theoretical** and 
empirical”> studies support this idea, with the latter showing that both 
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processes can affect Chinese cave §'8O. For both, lower 5'8O implies 
higher spatially integrated monsoon rainfall between the tropical mon- 
soon sources and the cave site and/or higher summer monsoon rainfall 
in the cave region. Thus, in this study, we use the terms ‘strong mon- 
soon and ‘weak monsoon to refer to low and high cave '8O, respec- 
tively, consistent with results from theoretical and empirical studies. 

Our new records span from 640 to 330kyr Bp (before present, where 
present = Ap 1950), which together with previous records (from 384kyr BP 
to present), allow the construction of a composite AM 6'°O record, 
covering the past 640 kyr (Extended Data Fig. 4). The record is char- 
acterized by millennial-scale variations superimposed on a quasi-sine- 
wave-like orbital-scale variability that broadly tracks 21 July NHSI**® 
(Fig. 1). Removal of orbital-scale variations yields a record of the 
suborbital variability of the AM (the A6!8O record)* (Extended Data 
Figs 5, 6). Detrending methods (for example, choice of insolation curve) 
could introduce artefacts in the A!80 record, for which we tested by 
removing the orbital component of the record using insolation curves 
from a range of times encompassing the boreal summer months. 
Similar A8!8O power spectra independent of detrending curve suggest 
that this artefact is not significant. Detrending methods and sensitivity 
tests are described in the legends of Extended Data Figs 5, 6 and in 
Methods. 


Timing and character of terminations 

The gradual build-up and rapid termination of ice ages with an ~100-kyr 
cycle are a well-known feature of the past ~650 kyr (ref. 9). Although both 
glacial cycles and changes in eccentricity share common spectral power, 
the latter generates negligible change in insolation, thus presenting an 
enduring climate puzzle—the so-called ‘100-kyr problem”. A number 
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of hypotheses have been put forth to address this problem. One hypoth- 
esis explains the 100-kyr cycle as an average of 4 to 5 discrete preces- 
sion cycles, with missed beats in between!®!!, Another invokes 2 to 3 
obliquity cycles!*!3, again with missed beats. Yet another invokes a 
combination of both obliquity and precession'*"'®. Others call for inter- 
actions involving internal oscillations in the Earth system”, 

Cheng et al.* have shown that each of the last four terminations 
is characterized by one or two WMIs, which coincide with HSs 
observed in North Atlantic marine cores”*®. Abrupt WMI endings are 
synchronous with abrupt increases in atmospheric CH, in Antarctic 
ice cores*, Using these cave-marine and cave-ice core correlations, 
Cheng et al.* placed the events observed in marine and ice cores on 
a cave chronology and made the following observations: the WMIs 
correlated with a good portion of each marine termination; the WMIs 
and the marine terminations took place at a time of rising NHSI; and 
most of the CO) rise associated with each termination took place dur- 
ing the WMIs (Extended Data Fig. 7). On the basis of these observa- 
tions, Cheng et al.* suggested that for each termination, the rise in 
insolation triggered the initial melting of the ice sheets. The North 
Atlantic cold anomaly that resulted from input of ice and meltwater 
rearranged oceanic and atmospheric circulation, causing the WMIs 
and resulting in the rise in atmospheric CO). The latter, along with a 
continuing rise in insolation drove the termination*”®. 

The unparalleled length and temporal precision of our cave record 
allow us to extend the aforementioned approach to robustly test ideas 
about the classic ‘100-kyr problem. Our data indicate that glacial ter- 
minations T-VII to T-V were also associated with WMIs (Fig. 2). The 
T-V WMI occurred between ~430.5 + 1.5 and ~426 + 2 kyr Bp. The 
T-VI WML, centred at 532.3 + 3.5 kyr Bp, has a duration of ~4.5 kyr, 
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Figure 1 | Asian monsoon variations in the context of the Earth’s 

orbital parameters. a—c, Changes in obliquity (a), eccentricity (b) and 
precession* (c). d, The composite AM 6'80 record (green; this study) and 
21 July insolation at 65° N (ref. 45; pink). e, Termination pacing and duration. 
Vertical bars mark the timing of WMIs correlated to glacial terminations 


(grey) and two similar events (MIS 4/3 and 5.2/5.1 transitions) (yellow). 
The timing of T-IIIa-WMI in this study differs from the one described in 
ref. 4, although we consider the latter a plausible alternative (see main text 
and Extended Data Fig. 9). f, The composite sea level'”. The timings of 
MBE, MIS 11, 7.3, 7.4, 15.1 and 15.2 are also depicted. 
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Figure 2 | Comparison of climate events surrounding terminations and 
other two millennial-scale events. A, Termination events surrounding 
T-V to T-VIL. a, 21 July insolation at 65° N (pink, W m~*)* and the AM 
8'80 record (green). b, d, EDC CH, (ref. 19) and relative temperature'® 
records, respectively. c, The composite CO, record”. The ice core ages 
(EDC3 chronology**) around T-VI and T-VIla are shifted to the older side 
by 3 and 2 kyr, respectively, to match the abrupt AM and CH, changes. 

e, f, Marine ODP980* (T-V) and U1314* (T-VI to T-VII) IRD and 
benthic 6'80 records, respectively. The marine records around T-V, T-VI, 
T-VIla and T-VI were shifted to the older side by 6 kyr, the younger 

side by 2 kyr, older side by 2 kyr and older side by 3 kyr, respectively, 

to match corresponding IRD events to WMIs. g, The composite sea 
level'”. B, Climate events surrounding MIS 4/3 and 5.2/5.1 transitions. 


assuming a linear growth rate of sample SB-32 around this time period. 
The T-VII WM is broadly similar to that of T-H, T-IV and T-V, ending 
abruptly at 627 + 6kyr Bp. Each coincides presumably with a major HS, 
a sea level rise marking each glacial termination, a CO) rise, and an 
Antarctic temperature rise (Fig. 2). These observations, now for the past 
seven terminations, support the hypothesis that rising NHSI triggers 
an initial ice-sheet disintegration, which in turn perturbs the oceanic 
and atmospheric heat and carbon cycles, resulting in a CO; increase, 
which further drives the termination*”’, 

All seven terminations occurred during the rising limbs of NHSI 
separated by 4, 5,5, 4, 5, and 5 precession cycles. The durations between 
successive terminations from T-VII to T-I were about 93, 105, 92, 92, 
113 and 115kyr, respectively, rather than strict 100-kyr cycles (Fig. 1). 
Thus, the “100-kyr cycle’ is an approximate average of intervals that are 
generally a little longer or shorter than 100kyr. In addition, we char- 
acterize two ‘extra terminations’ revealed in marine records as follows: 
T-IIla (ref. 12) occurred one precession cycle after T-III (between MIS 
7.4 and 7.3), and T-VIla occurred two precession cycles after T- VII 
(between MIS 15.2 and 15.1) (Fig. 1). Both exhibit a pattern of events 
similar to the seven main terminations, and large and comparable 
marine §'8O or sea level changes’’ (Figs 1, 2). 

Each of the nine terminations is separated from adjacent termina- 
tions by an integral number of precession cycles. Thus, insolation inten- 
sity is critical in controlling the timing of terminations. In contrast, 
termination timing does not exhibit an obvious relationship to any 
particular portion of the obliquity or eccentricity cycles. For exam- 
ple, terminations take place when eccentricity is high (>0.02: T-IL 
T-Ila, T-II, T-IV, T-VIla, T-VII) and when it is low (<0.02: T-I, T-V, 
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a, Greenland ice core (NGRIP) §'80 record®’. b, 21 July insolation at 65° N 
(pink, W m~*)* and AM 6180 record (green). c, e, EDC CH, (ref. 19) 

and relative temperature'® records, respectively. d, The composite CO, 
record’. The ice core ages around MIS 4/3 and 5.2/5.1 transitions are 
shifted to the older side by 0.5 and 1 kyr respectively to match the abrupt 
AM and CH, changes. f, g, ODP980 IRD and benthic §'80 records’, 
respectively. On the basis of correlations between IRD events and WMIs, 
the ODP980 ages are shifted to the younger side by ~4 and 3 kyr around 
MIS 4/3 and 5.2/5.1 transitions, respectively. h, The composite sea level’”. 
Its chronology around the MIS 4/3 transition is shifted to the older side by 
~5 kyr. Vertical grey bars indicate WMIs and corresponding events (COz, 
Antarctic temperature, and North Atlantic IRD events). Dashed lines 
depict correlations between abrupt intensification in AM and CH, jump. 


T-VI) (Fig. 1). Thus, changes in eccentricity are not, in a direct fashion, 
responsible for the pacing of the ‘100-kyr cycle’ Similarly, terminations 
take place as obliquity is increasing (T-IIa, T-V, T-VII), decreasing 
(T-III, T-V1), and at peak or near peak values (T-I, T-II, T-IV, T-VIla). 
Thus, as with eccentricity, obliquity does not pace terminations in a 
precise fashion. However, as none of the nine events takes place when 
obliquity is substantially below average, obliquity may play some causal 
role!?-1¢, 


Pacemaker of millennial-scale events 

A number of studies have focused on the cause of skipped precession 
and/or obliquity beats'!*'° between terminations. More recent studies 
have referred to the events at some of these ‘missed beats’ as ‘low- 
amplitude versions of terminations’, ‘failed terminations’®, or 
‘unfinished terminations”. For example, the MIS 4/3 and MIS 5.2/5.1 
transitions took place at times of NHSI rise and have many of the 
features of full terminations, including periods of unusually weak mon- 
soon (Fig. 2)—although, akin to T-VI, they do not result in extended 
interglacial conditions, but are characterized by significant sea level 
rise with larger marine 6'8O shifts than T-VI'” (Figs 1, 2). Explanations 
for full versus small-scale terminations focus on interplays and feed- 
backs among factors internal to the climate system, such as ice volume 
dynamics, in addition to orbital forcing'®?!~*4. 

As all full terminations and, at least, some small-scale termina- 
tions are associated with millennial-scale intervals of unusually weak 
monsoon, we examine the suborbital component of the full record to 
probe its relationship to orbital geometry. We start by removing orbit- 
al-scale frequencies to create a residual A8!80 record (Fig. 3), in which 
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Figure 3 | Comparison of suborbital AM and Antarctic temperature 
variations. A, Interval from 350 to 0 kyr sp. B, Interval from 650 to 

300 kyr Bp, In both A and B: a, the composite AM 8/80 record (green) 
and 21 July insolation at 65° N (pink)*; b, suborbital AM variation 
(A8!80, detrended from the composite §!8O record by subtracting 21 July 
insolation at 65° N; see Methods); c, Antarctic suborbital temperature 


suborbital variability is clearly observed®. Prominent features are the 
aforementioned WMIs, millennial-scale features characterized by large 
positive anomalies in 6'°O at terminations. A closer look shows many 
smaller amplitude, millennial-scale, high-6!8O events throughout, 
including those correlated with small-scale terminations. We infer 
that these events may share a similar origin with the WMIs, since the 
pattern of events surrounding these smaller events is indeed similar to 
terminations, including some for which the marine §'80/sea level shift 
is comparable or larger than that of T-VI (Fig. 2). Specifically, we infer 
that the smaller amplitude events are caused by the decay of the north- 
ern ice sheets, resulting in the flux of ice and meltwater into the North 
Atlantic; the ensuing slowdown in Atlantic Meridional Overturning 
Circulation (AMOC) generates a cold anomaly over the North Atlantic. 
Through an atmospheric teleconnection, the cold anomaly results in a 
weaker AM, recorded as a high 6!8O anomaly in our record (Fig. 2)!48. 

Spectral analysis of the A8'8O record reveals strong power in the 
precession band, with the suborbital component of the AM close to 
anti-phased with (that is, ~180° from) 21 June insolation (Fig. 4b). 
The suborbital spectrum also exhibits weaker but significant power 
in the obliquity band, again with the suborbital component close to 


variation (ASD record detrended from the 6D record!’; see Methods). 
Orange curves show 21 June insolation at 65° N (ref. 45) on a reversed 
scale (increasing down) for comparison. A remarkable similarity is evident 
between suborbital variations in AM A&'8O and Antarctic A&D records 
(also see Extended Data Fig. 8). 


anti-phased with obliquity. Thus, insolation modulates the suborbi- 
tal component of AM variability, but in the opposite sense from the 
more direct control of orbital-scale variability of the AM. We take these 
observations to indicate that high NHSI, whether in the precession or 
obliquity bands, favours the disintegration of the northern ice sheets 
and the release of ice and meltwater into the North Atlantic. This signal 
then propagates through the ocean and atmosphere as outlined above 
(and previously described for WMIs), resulting in the weakening of 
the monsoon*”®, Inspection of Fig. 3 shows that the relationship is 
pervasive throughout the record. 

We have performed a similar analysis on the Antarctic 6D (a tem- 
perature proxy) record!®, by first removing low frequency, orbital-scale 
variability, then examining the power spectrum of the resulting sub- 
orbital variability*> (Fig. 4 and Extended Data Fig. 8). The suborbital- 
scale Antarctic temperature and AM records are remarkably similar, 
with the suborbital components of the AM and Antarctic temperature 
anti-correlated. We attribute this relationship to the bi-polar seesaw 
mechanism*®. As input of meltwater and ice into the North Atlantic 
is the presumed source of both sets of suborbital variability, it is not 
surprising that the suborbital component of Antarctic temperature 
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Figure 4 | Cross-spectral comparison. Compared are insolation, AM, 
detrended (suborbital) AM and detrended (suborbital) Antarctic 

8D records over the past 640 kyr Bp. a, Spectral analysis results of 

21 July insolation at 65° N (ref. 45; pink) and AM 6'80 (green), and 

their coherence spectra (black). b, Spectral analysis results of 21 June 
insolation at 65° N (ref. 45; pink) and detrended AM A6!8O (blue), and 
their coherence spectra (black). c, Spectral analysis results of detrended 
AM A&!*O (blue) and detrended Antarctic A&D records (olive), and their 
coherence spectra (black). Coherence spectra are at 80% confidence level. 


variability also exhibits significant power in the precession and obliq- 
uity bands nearly in phase with NHSI. These results suggest that the 
change of NHSI induced by precession (with lesser involvement of 
obliquity) is the major external pacemaker of both conventional and 
the ‘small-scale’ termination events, which coincide with the large-scale 
weakening of the AM and warming in Antarctica**8, 

While the sequence of events surrounding each positive AM 5!8O 
anomaly is similar, the relative amplitudes of these events vary (Fig. 3). 
Hence, the key issue is not why terminations are spaced by some dis- 
crete number of precession cycles, but rather why some of the events 
developed ultimately into full terminations. The possibility of a certain 
event evolving into a full termination may depend on the state of the 
climate system at that time—for example, favourable internal inter- 
plays/feedbacks of ice-sheet dynamics and size, insolation, and ocean 


Numbers in parentheses show phase differences in degrees between two 
spectra at precession (~23 and 19 kyr) and obliquity (~41 kyr) bands. 
Precession periodicity of ~23 kyr is significant in all four datasets. The 
variations in both AM A&'80 and Antarctic A8D records also show 
significant power at obliquity band. Cross-spectrum analyses reveal a 
phase coherency at precession band between AM 8180 record and 21 July 
insolation and between Antarctic ASD and 21 June insolation. AM Ad 
180 is nearly anti-phased with Antarctic AD and 21 June insolation at 
precession band. 


and CO; feedbacks*®°-*4, For example, as pointed out by Raymo*, 
a large and isostatically-compensated ice sheet would be relatively vul- 
nerable to decay. In this scenario, precession cycles would be skipped or 
a ‘termination would be unfinished until the ice sheets were in such a 
state, on average after about 100 kyr. Regardless of exact mechanisms, 
the WMIs and smaller analogues to WMIs are indeed paced primarily 
by precession cycles, suggesting that the 100 kyr in the ‘100-kyr prob- 
len? represents an approximate mean of integral numbers of precession 
cycles between terminations. 


Mid-Brunhes event and MIS 11 

Our record covers much of the time since the so-called Mid-Pleistocene 
revolution'! and crosses the MBE*’. The MBE is contemporaneous 
with T-V, a portion of which correlates with a WMI in our record. The 
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Figure 5 | Comparison of climate records across the MBE. 

a, d, e, Relative temperature’®, &!8Oatm (ref. 50) and CHy (ref. 19) records, 
respectively, from Antarctic EDC ice core records (EDC3 chronology*®). 
b, The suborbital variations in the AM A8!80 record. c, The orbital 
variations in the AM 8!°O record. f, The composite CO, record”°. 
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g, The composite sea level'”. Horizontal dashed lines indicate interglacial 
amplitude shift across the MBE. Vertical bars indicate WMIs and 
associated T-I to T-VII (grey) as well as MIS 4/3 and 5.2/5.1 transitions 
(yellow). 
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Figure 6 | Comparison among Holocene records. a, 21 July insolation 

at 65° N (ref. 45). b, The composite AM 680 records. c, The Holocene 
terrestrial '8O/1°O fractionation variation*’. d, The mean grain size of 
sortable silt (10-63 1m) for North Atlantic sediment cores MD2251 

and MD2024, a proxy of North Atlantic Deep Water formation*’. e, North 
Atlantic deep-flow dynamic proxy (low field magnetic susceptibility, k) 
records from cores CH77-02 (black) and MD08-3182Cq (grey)**. 

f, The South American monsoon record’. g, The 680 record (WAIS, 

a temperature proxy) from western Antarctica*!. Vertical bar depicts the 
‘2-kyr shift. 


amplitude of glacial—interglacial cycles in CO, Antarctic temperature, 
and global ice volume (or sea level) increased substantially after the 
MBE. In contrast, interglacial CH, 88O.tm and AM intensity in our 
records are only slightly enhanced across the MBE (Fig. 5), similar to 
another hydrological record reported previously from low latitude**. 
Thus, changes in the AM behaviour across the MBE were not as large as 
changes at high latitudes and in COd. In addition, there are numerous 
examples of high AM intensity during glacial periods when ice volume 
was large and CO) was low (Fig. 5). Furthermore, the initial increase of 
the AM tends to start at each insolation minimum and in some cases 
does not appear to correlate with any obvious ice-volume and/or CO, 
changes (Extended Data Fig. 9). These observations indicate that, at the 
orbital scale, the AM and associated climate and atmospheric chemistry 
change are largely controlled by NHSI°. However, the post- MBE WMIs 
associated with glacial terminations are much more intense (Fig. 5b). 
It is possible that this change results from a shift in the response time 
of the AM to NHSI. However, in our view, a more likely cause is 
the observed shift to higher maximum ice volume resulting in higher 
ice and meltwater flux during terminations and larger perturbations of 
the climate system, including more extreme WMIs. In this light, T-V 
has the first of the large post- MBE-style WMIs. 

Our new data also include a well-dated record of AM variability 
during MIS 11. Orbital parameters for MIS 11 are closer to those of 
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the Holocene than for other recent interglacial periods, making it an 
analogue for the Holocene and future climate”!. However, uncertainties 
remain regarding its timing and duration. We estimate the beginning of 
AM MIS 11 to be the abrupt end of the T-V WMI at 426 + 1 kyr Bp. If we 
take the end to be the AM minimum at 396 + 3kyr Bp (about the time 
of the half-height of the benthic &!80 MIS 11/10 shift!”), the duration 
is 30 +4 kyr. If we take the end to be the time of the half-height of the 
AM shift after the MIS 11 AM peak (at 399 +3 kyr BP), the duration is 
27 +4kyr. These estimates are consistent within age uncertainties with 
the duration of MIS 11 defined by other less precisely dated records”’. 


The ‘2-kyr shift’ 

Over the past ~2 kyr the AM has increased in an anomalous fashion 
relative to the downward trend in NHSI (Fig. 6 and Extended Data 
Fig. 10). This trend relates to climate change elsewhere around the 
world in much the same way as the millennial-scale shifts over the past 
640 kyr. We refer to this late Holocene anomaly as the ‘2-kyr shift: The 
‘2-kyr shift’ is more than just a regional shift in the AM as evidenced by 
an in phase, positively-correlated shift in the &'8O of atmospheric O2 
(ref. 39), which integrates a broad swath of the globe and may, in large 
part, respond to AM intensity. The AM shift also anti-correlates with 
records of the South American monsoon” and temperature over some 
parts of Antarctica‘! (Fig. 6) — a pattern similar to the one observed 
for millennial-scale events throughout much of the past several hun- 
dred kyr. A change in AMOC has been suggested as a key process that 
may explain aspects of such millennial-scale changes”. In this regard, 
two studies*** show generally decreasing AMOC for several millennia 
before ~2 kyr Bp, then constant or perhaps slightly increasing AMOC 
for the last 2 kyr (Fig. 6). Thus, it is plausible that the origin of the ‘2-kyr 
shift’ is a progressive increase in the rate of AMOC over the past 2 kyr. 
Nevertheless, observational and modelling studies are critically needed 
to further assess the ‘2-kyr shift. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

230Th dating. Four stalagmite samples, SB-12, SB-14, SB-32 and SB-58, were 
collected from Sanbao Cave, Hubei, China. The sampling location is ~1,500m 
from the entrance, marked by a relative humidity of ~100%. The stalagmites were 
cut into halves along their growth axes and polished. A total of 196 sub-samples 
(25, 69, 84 and 18 for SB-12, SB-14, SB-32 and SB-58, respectively) were 
drilled for °°Th dating (Supplementary Table 1). In addition, 9 dates were 
also obtained for stalagmite D8 from Dongge Cave, China. The dating work 
was performed at the Minnesota Isotope Laboratory, University of Minnesota 
(Sanbao samples) and the Institute of Global Environmental Change, Xi'an 
Jiaotong University, Xian, China (Dongge samples). The *°Th dating tech- 
niques are essentially identical in the two laboratories. All measurements 
were made on Thermo-Finnigan Neptune multi-collector inductively 
coupled plasma mass spectrometers using the recently improved technique’. 
We use standard chemistry procedures to separate U and Th as described 
in ref. 51. The isotope dilution method with a triple-spike ??°Th-?"U-*°U 
was employed to correct for instrumental fractionation and determine U and Th 
isotopic ratios and concentrations. The instrumentation, standardization and half- 
lives are reported in refs 5 and 52. All U and Th isotopes were measured either 
on the Faraday cups (larger sample size) or on a MasCom multiplier behind the 
retarding potential quadrupole in the peak-jumping mode (smaller sample size). 
We followed similar procedures of characterizing the multiplier as described in 
ref. 52. Uncertainties in U and Th isotopic data were calculated offline at the 20 
level, including corrections for chemistry/instrument blanks, multiplier dark noise, 
abundance sensitivity, tails, and contents of the same four nuclides in the spike 
solution®**. Corrected 7°°Th ages assume an initial *°°Th/?**Th atomic ratio of 
(4.42.2) x 10~°, the values for a material at secular equilibrium with the bulk 
Earth **Th/?*8U value of 3.8. The correction is negligible because the samples used 
in this study have high U and low Th contents. The age model for each stalagmite 
is established by either linear interpolation or polynomial fitting (Extended Data 
Fig. 3). 

Stable isotope analysis. Oxygen isotopic composition (5'°O) of stalagmite samples 
was analysed at three laboratories— Universitat Innsbruck, Austria (the top 12.4cm 
of sample SB-14, ~2,400 subsamples), Nanjing Normal University, China (133, 585, 
140 and 155 subsamples from SB-12, SB-14, SB-32 and SB-58, respectively), and 
Xian Jiaotong University (80 subsamples from D8). Results are reported in per 
mil (%o), relative to the Vienna PeeDee Belemnite (VPDB) standard. We obtained 
a total of ~3,360 stable isotope data (Supplementary Table 1). The techniques 
used in Universitat Innsbruck are described in ref. 53. Stable isotope samples were 
micromilled perpendicularly to the extension axes of the stalagmites at 0.05 to 
0.1 mm increments and analysed using an on-line carbonate preparation system 
(Gasbench II) interfaced with an isotope ratio mass spectrometer (Delta?!"SXL). 
The long-term reproducibility is ~0.08%o (1c). The stable isotope measurements 
in Nanjing Normal University were made on a Thermo-Finnigan MAT-253 mass 
spectrometer fitted with a Kiel Carbonate Device III. Stable isotope samples were 
milled using carbide dental burrs ranging in size from 0.3 to 0.5mm along the 
central growth axis of stalagmites. Samples were calibrated against the NBS-19 
standard. Standard measurements have an analytical precision of typically 0.08% 
(1a). The measurements in Xi'an Jiaotong University were made on a Thermo- 
Finnigan MAT-253 mass spectrometer fitted with a Kiel Carbonate Device IV. 
Stable isotope samples were micromilled perpendicularly to the extension axes of 
the stalagmite D8 at 0.05 to 0.1 mm increments. Duplicate measurements of NBS19 
and TTB1 standards show a long-term reproducibility of ~0.1%o or better (1a). 
Composite Chinese cave 5'°O record over the last 640 kyr. The composite 
AM 6180 record over the last 640 kyr is based on previously published data 
(<384kyr Bp)? 4455 and new data from four Sanbao stalagmite records (>384kyr BP) 
from this study (SB-12, SB-14, SB-32 and SB-58). The composite 6!8O record is con- 
structed based on two criteria to select temporally-overlapped records—the dating 
control and the temporal resolution (see details in Supplementary Table 1). The 
new 6/80 record from Dongge Cave (D8) replaces the previously published records 
between 217.2 and 225.3 kyr Bp, as the age control of stalagmite D8 is superior for 
this time period. The overall average temporal-resolution of the record is 85 years. 
The lowest resolution (~500 years) is in the interval of the previous record between 
~305 and 260 kyr Bp, and the rest of time periods have resolution better than 
200 years (Supplementary Table 1). 

Normalization and detrending. In order to remove the orbital insolation compo- 
nent from the composite AM 8/80 data, we used two different methods, z-standard 
and principal components analysis (PCA). For the z-standard method, we first 
converted the composite 6'*O data to z-standard by using the mean and standard 
deviation of each data set (that is, zero-mean normalization). An average data 
resolution of 200 years was used to allow point-to-point alignment. The detrended 
5'80 data (A6!80) was obtained by subtracting 21 July insolation at 65°N (refs 3, 6) 
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from the 6!80 record after standardization and equal spacing to the 200-year inter- 
val. Insolation is normalized to be equal in magnitude to 6'8O and assigned an 
opposite algebraic sign: A880 = 8!8O — (—NHSInorm) = §!8O + NHSInorm- For the 
PCA method, we also converted the composite AM 6!8O data in 200-year time- 
steps to synthesize the insolation in the same time-steps. We then standardized the 
common principal components obtained from the insolation time series and the 
original AM records. The PCAs are computed by using SPSS 15.0. The difference 
between the AM record and the common principal components was then used to 
characterize the millennial-scale variability. In addition to 21 July insolation, we 
also calculated the detrended results by using different insolation curves (21 June, 
6 July and 6 August insolation at 65° N), in order to test the sensitivity. We found 
that the results are similar (Extended Data Figs 5 and 6). The methods and results 
used here are essentially similar to those described in ref. 8. The Aé'8O record 
obtained by the z-standard method is used in the figures for comparison. 

The method used for detrending of the Antarctic 6D record from EDC ice 

cores (Extended Data Fig. 8) was modified from that described in ref. 35. We first 
defined the long-term trend from the 8D record by binning the combined data 
using the mean over 0.1-kyr intervals on 6-kyr windows on each data point. The 
window length of 6 kyr was chosen to effectively remove the glacial—interglacial 
trend*». Sensitivity tests through varying the length of the 6-kyr window by a 
factor of two (that is, between 3 and 12 kyr) show no substantial impact on the 
spectral results. Second, we obtained the detrended record (A&D) by subtracting 
the long-term trend from the 6D record. Our results are similar to those described 
in ref. 35. 
Spectrum analysis. In order to identify periodic components in the spectrum of 
our AM A8!80 record, we applied the spectral analysis following the Blackman- 
Tukey method* using the ARAND software package. The following parameters 
were used to optimize bias/variance properties of spectrum estimates: number of 
lags = 1,200 (~1/3 length of record) and samples per analysis = 2,000. The results 
show that the A&"8O records have significant power at ~23 kyr, weaker power 
at ~41 kyr and insignificant power at ~100kyr (Fig. 4, Extended Data Figs 5 
and 6). 

Cross-spectrum analysis results are obtained between AM 6180, AM A8!80, 

Antarctic A8D, and insolation records over the last 640 kyr using the ARAND 
software package. Detailed methods are described in ref. 56. The coherency spec- 
tra are compared with the 80% non-zero coherency level, resulting in a dominant 
cyclicity of ~23 kyr for all the analysis results (Fig. 4, Extended Data Figs 5, 6 
and 8). However, the deviation of the phase spectrum from the zero-phase line 
suggests that the detrended AM records may have different phases relative to 
insolation at precession bands (~23 kyr), depending on the time of insolation 
used in the detrending process. If 21 July insolation is used for the detrending, 
as suggested by both empirical’ and theoretical® studies, the millennial vari- 
ability in the detrended AM record (A8!8O record) is then nearly anti-phase 
with 21 June insolation (~180° + 10° or +1 kyr). In addition, because absolute 
230Th dating errors increase progressively with sample age, we also tested all the 
spectrum analysis results by using the AM record only for the last 480 kyr BP 
when the dating errors are smaller. In all cases, we essentially obtained the same 
analysis results. 
Chronologies. We use original chronologies of other climate records in most cases, 
including the Greenland ice core record (GICC05 chronology)”, Antarctic ice 
core CO>, CH, and 8!8O,im, dust and temperature records (EDC3 chronology)**, 
the composite CO, record (AICC2012 chronology)”, and the stacked benthic 
5180 or the composite sea level record (LR04 chronology)'”°”. However, in order 
to compare our AM record with marine IRD and benthic 6!8O records of ODP 980 
and U1314 cores from the North Atlantic in Fig. 2, we have tuned their chronolo- 
gies to our composite AM record through the correlation strategy as described in 
the main text: that is, by simple shifting of original chronologies to align the IRD 
events with WMIs as depicted by the grey bars in Fig. 2. In addition, we also tuned 
EDC ice core chronologies around the MIS 4/3 and 5.2/5.1 transitions in Fig. 2 by 
synchronizing the abrupt AM change with the CH, jump as described in the text. 
Amounts of age shifts from the original chronologies of the marine and ice core 
records are described in the figure legend. 
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Sanbao Hulu 


Dongge 


Extended Data Figure 1 | Schematic map of the vast Asian summer 119° 10’ E) and Dongge (25° 17’ N, 108° 5’ E) caves. The AM composite 
monsoon system. Arrows depict wind directions, yellow labels show wind _ record is constructed from speleothem &'°O records from these three 
names. The Mascarene High is a high pressure system near the Mascarene caves. The map was constructed using NASA’s World Wind program 
Islands. Stars indicate Sanbao (31° 40’ N, 110° 26’ E), Hulu (32° 30’N, (http://worldwind.arc.nasa.gov/java/). 
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(2c). The *°Th dating method is described in ref. 5 and the dating results 


Extended Data Figure 2 | New speleothem records from China. Four 
are listed in Supplementary Table 1. A high degree of similarity among 


new stalagmite 6'8O records used in this study are from Sanbao Cave, 
Hubei, China (labelled by sample numbers SB-12, SB-14, SB-32 and the coeval portions of different 5'°O records (the replication test”°*) 
SB-58), and one is from Dongge Cave, Guizhou, China (labelled by demonstrates that kinetic factors and water/rock interactions do not have a 
sample number D8) (Fig. 1). Error bars indicate *°Th ages and errors substantial effect on the speleothem 8'°O values. 
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Extended Data Figure 3 | Stalagmite age models. a-e, Age models are successive **°Th dates, whereas chronologies for other samples are based 
shown for five stalagmites: SB-12 (a), SB-14 (b), SB-32 (c), SB-58 (d) on polynomial fitting of 7°°Th dates. The vertical error bars depict errors 
from Sanbao Cave, and D8 (e) from Dongge Cave. The chronology of (20) of 3°Th dates. 


the upper part of SB-14 is established by linear interpolation between 
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Extended Data Figure 4 | AM records over the past 640 kyr BP. (see details in Methods and Supplementary Table 1). e, Detrended AM 
a, Previously published 6'8O records over the past 384kyr Bp (black) from —_ record (A\6'80) is obtained by using the z-standard method to remove 
Hulu, Dongge and Sanbao caves?-“4?5455, and new SB-12 (blue) and SB-32 _ the insolation component (21 July insolation at 65° N) (see Methods). 
(olive) 5!8O records. b, New SB-14 6'8O record. c, New SB-58 (purple) f, Detrended AM result from ref. 8, which is essentially identical to our 
and D-8 (pink) §'80 records. d, Composite AM 6'8O record over the past results. Minor differences exist because the AM 6180 records used in 
640 kyr Bp. The record is constructed from previous data (<384 kyr Bp) ref. 8 are slightly different. 
and new data from four stalagmites from Sanbao Cave (>384 kyr BP) 
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Extended Data Figure 5 | The AM A5/°O record over the past 640kyr BP 21 June (i), 6 July (ii), 21 July (iii) and 6 August (iv) insolation respectively, 


obtained by the z-standard method and cross-spectral comparison using the z-standard method. Numbers in parentheses show the phase 
with insolation. a, Comparison among different AM A&'8O records differences in degrees between insolation and A8'°O record. In all cases, 
detrended by subtracting 21 June (green), 6 July (brown), 21 July (blue) the most significant power in the A68!80 record is in the precession band 
and 6 August (orange) insolation respectively, using the z-standard (~23 kyr). The ~41-kyr power is also present, but is relatively weak. 
method (see Methods). b, Cross-spectral comparisons between the The ~100-kyr power is insignificant. 


insolation for detrending (pink) and A5!8O records (blue) detrended by 
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Extended Data Figure 6 | The AM A5'80 record over the past 640 kyr BP obtained by the principal component analysis method and cross-spectral 
analysis with insolation. a, b, As Extended Data Fig. 5, except the principal component analysis method was used, instead of the z-standard method. 
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Extended Data Figure 7 | Comparison of millennial-scale climate events 
over the past 140 kyr. a, North Atlantic ODP980 (dark blue)’ and LR04 
(light blue)*” benthic 6'8O records. b, ODP980 IRD records’. We correlate 
the IRD event around T-II to the WMI with similar duration (depicted 

by grey bar), which is consistent with the obvious offset of its 6'8O shift 
related to that in LR04 record (dashed bar). YD and H1 to H6 indicate 

the Younger Dryas event, and Heinrich Stadial events 1 to 6, respectively. 
c, Greenland ice core (NGRIP) §!°O record”’. d, Antarctic ice core (EDC) 
dust record*’. e, Detrended Antarctic 6D record'* (A8D), using a method 
modified from ref. 35. f, AM millennial variability (A8'80, detrended 


from the composite AM 8/80 record by subtracting 21 July insolation at 
65°N). g, Composite AM 6180 record. h, EDC CH, record". Vertical grey 
bars indicate major weak AM intervals (WMIs) and corresponding events 
(increase of temperature and decrease of dust flux in Antarctica, cold 
events in Greenland and IRD events in the North Atlantic Ocean). Dashed 
lines depict correlations between abrupt AM intensification and CH, jump 
(brown) and between weak monsoon and IRD events (black). All ice core 
records are on their EDC3 chronology**. Notably, the AM A6!80 and 
Antarctic ASD records show striking similarity, demonstrating a common 
millennial-scale variability (Fig. 3). 
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Extended Data Figure 8 | Comparison and cross-spectral analyses (left), and with the AM A8!80 record (blue) (right), respectively. Numbers 
between insolation, AM A3'80 and Antarctic ASD records. a, AM in parentheses show the phase differences in degrees. In all records, 
A880 record (blue). b, Detrended Antarctic 6D record!’ (A$D, olive), using the precession cycle of ~23 kyr is significant. The phase of the Antarctic 
a method modified from ref. 35. 21 June insolation at 65° N (ref. 45; pink) ASD record at precession band is close to 21 June insolation, and nearly 
is plotted for comparison in a and b. c, Comparison between Antarctic anti-phased with the AM A8!°O record. The remarkable correlation 
ASD (olive) and AM A6!8O (blue) records. d, Cross-spectral analyses of between weak AM (positive A8'8O anomaly) and warm Antarctica 
the Antarctic A8D record (olive) with 21 July insolation at 65° N (pink) (positive A$D anomaly) is evident (c). 
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Extended Data Figure 9 | Comparison of the AM variability with sea T-IIIa WMI we also indicate the correlation previously made by Cheng 
level (global ice volume) and atmospheric CO, changes. Upper panel, et al.* with a beige bar. Although we consider this as a plausible alternative 
interval from 350 to 0 kyr Bp; lower panel, interval from 650 to 300 kyr Bp. correlation, we prefer the new correlation presented in Figs 1, 3 and 5, 
In both panels: a, AM A8'8O record. b, AM 8'°O record (green) and and in this figure. The new correlation fits much better with the original 
21 July insolation at 65° N (pink)**. c, Composite sea level record'’. chronologies of the ice core and marine records. In addition, the match of 
d, Composite atmospheric CO) record”. Grey bars show the timing of the adjacent high 6'*O AM anomalies and the ice rafted debris record’ is 
WMIs and associated terminations. Two yellow bars indicate the two better with the new correlation. Some examples of initial AM rises around 
millennial-scale positive anomalies (or WMIs), marking the ‘unfinished NHSI minima are depicted by green arrows and dashed lines, which do 
terminations’*°—the MIS 4/3 and 5.2/5.1 transitions (Fig. 2). For the not appear to link directly to either global ice volume or CO2 changes. 
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Extended Data Figure 10 | Comparison between the Holocene and MIS 11 _ record from Cueva del Tigre Perdido, northern Peru®®. g, South African 
on the basis of the insolation alignment. a, 21 July insolation at 65° N for monsoon record from Cold Air Cave, Makapansgat Valley, South Africa®. 


the Holocene (green) and MIS 11 (pink)**. b, c, Composite AM Vertical bar depicts the ‘2-kyr shift. These records show ‘2-kyr shift 
&!80 records during MIS 11 and the Holocene, respectively. d, AM cave trends that are different from their trends exhibited in the middle to late 
8'80 record from the Indian monsoon domain™. e, North African Holocene interval. The monsoon ‘2-kyr shift’ also appears to show an 
monsoon record (seawater 5'8O record from the marine sediment core, opposite inter-hemispheric pattern. 


MD03-270, from the Gulf of Guinea)*!. f, South American monsoon 
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A combinatorial strategy for treating 
KRAS-mutant lung cancer 


Eusebio Manchado!, Susann Weissmueller!”, John P. Morris IV!, Chi-Chao Chen!?, Ramona Wullenkord!, Amaia Lujambio!*, 
Elisa de Stanchina”, John T. Poirier*°, Justin F. Gainor’, Ryan B. Corcoran’, Jeffrey A. Engelman’, Charles M. Rudin®°®, 
Neal Rosen*® & Scott W. Lowel’ 


Therapeutic targeting of KRAS-mutant lung adenocarcinoma represents a major goal of clinical oncology. KRAS itself has 
proved difficult to inhibit, and the effectiveness of agents that target key KRAS effectors has been thwarted by activation of 
compensatory or parallel pathways that limit their efficacy as single agents. Here we take a systematic approach towards 
identifying combination targets for trametinib, a MEK inhibitor approved by the US Food and Drug Administration, which 
acts downstream of KRAS to suppress signalling through the mitogen-activated protein kinase (MAPK) cascade. Informed 
by a short-hairpin RNA screen, we show that trametinib provokes a compensatory response involving the fibroblast 
growth factor receptor 1 (FGFR1) that leads to signalling rebound and adaptive drug resistance. As a consequence, genetic 
or pharmacological inhibition of FGFR1 in combination with trametinib enhances tumour cell death in vitro and in vivo. 
This compensatory response shows distinct specificities: it is dominated by FGFR1 in KRAS- mutant lung and pancreatic 
cancer cells, but is not activated or involves other mechanisms in KRAS wild-type lung and KRAS-mutant colon cancer 
cells. Importantly, KRAS-mutant lung cancer cells and patients’ tumours treated with trametinib show an increase in 
FRS2 phosphorylation, a biomarker of FGFR activation; this increase is abolished by FGFR1 inhibition and correlates with 
sensitivity to trametinib and FGFR inhibitor combinations. These results demonstrate that FGFR1 can mediate adaptive 


resistance to trametinib and validate a combinatorial approach for treating KRAS-mutant lung cancer. 


KRAS encodes a GTPase that couples growth factor signalling to the 
MAPK cascade and other effector pathways. Oncogenic KRAS muta- 
tions compromise its GTPase activity, leading to accumulation of KRAS 
in the active GTP-bound state and thereby to hyperactive signalling that 
initiates and maintains tumorigenesis’. Owing to the high frequency of 
KRAS mutations in lung adenocarcinoma and other cancers, strategies 
to inhibit the KRAS protein or to exploit synthetic lethal interactions 
with a mutant KRAS gene have been widely pursued but have been 
fraught with technical challenges or produced inconsistent results”~’. 
Conversely, strategies to target key RAS effectors including MAPK 
pathway components RAF, MEK, and ERK have been hindered by 
toxicities associated with their sustained inhibition and/or adaptive 
resistance mechanisms*""". 


Screen to identify trametinib sensitizers 

Hypothesizing that sustained MAPK inhibition is necessary, but not 
sufficient, for targeting KRAS-mutant cancers, we performed a pool- 
based short hairpin RNA (shRNA) screen to identify genes whose inhi- 
bition sensitizes KRAS-mutant lung cancer cells to the FDA-approved 
MEK inhibitor trametinib (Supplementary Table 1). A customized 
short-hairpin RNA (shRNA) library targeting the human kinome was 
introduced into the TRMPVIN vector that we previously optimized 
for negative selection screening'”"’. In this system, cassettes encod- 
ing a mir-30 shRNA linked to a dsRed fluorescent reporter are placed 
downstream of a tetracycline responsive promoter, enabling doxycy- 
cline dependent gene silencing and the facile tracking and/or sorting 
of shRNA expressing cells (Extended Data Fig. 1a)". This library was 
transduced into H23 KRAS°!© mutant lung cancer cells expressing a 


reverse-tet-transactivator (rtTA3). The transduced populations were 
then treated with doxycycline in the presence or absence of 25nM 
trametinib, a dose that effectively inhibits ERK signalling without sub- 
stantially affecting proliferation (Extended Data Fig. 1b-e). After ten 
population doublings, changes in shRNA representation were deter- 
mined by sequencing of shRNAs amplified from dsRed-sorted cells 
(Extended Data Fig. 1b). 

As expected, shRNAs targeting essential genes (RPAI and CDK11A) 
were strongly depleted in both vehicle and trametinib-treated cells, 
whereas the relative representation of neutral non-targeting control 
shRNAs (Renilla (REN)) remained unchanged (Fig. 1a and Extended 
Data Fig. 1f, g). Using selection criteria that required an average four- 
fold or greater depletion between conditions, we identified 64 shRNAs 
corresponding to 53 genes that were selectively depleted upon MEK 
inhibition in trametinib-treated compared with untreated cells (Fig. la 
and Extended Data Fig. 1h). Of these, shRNAs targeting the eight genes 
for which multiple shRNAs identified as hits were validated using cell 
competition assays in multiple KRAS-mutant lung lines. These studies 
identified BRAF, CRAF, ERK2, and FGFR1 as the top candidates in our 
screen (Fig. 1b and Extended Data Fig. 2a). 

Trametinib has superior pharmacological properties compared 
with other MEK inhibitors because it impairs feedback reactivation 
of ERK?®. Still, the fact that MAPK components were identified as 
hits in our screen implied that pathway reactivation eventually 
occurs. Indeed, although trametinib stably inhibits ERK signal- 
ling at 48 h—a time where rebound occurs with other agents'’—we 
observed an increase in phospho-ERK after 6-12 days of drug expo- 
sure (Fig. 1c). This rebound was reduced by subsequently increasing 
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Figure 1 | Suppression of MAPK signalling effectors and FGFR1 
sensitizes KRAS-mutant lung cells to trametinib. a, Relative abundance 
of each shRNA in the library in vehicle- or trametinib-treated H23 cells 
after ten population doublings on doxycycline. The mean of three 
(vehicle) and two (trametinib) replicates is plotted. Positive and negative 
controls included shRNAs targeting RPAI and CDK11A (red circles), and 
renilla (REN) luciferase (green circles). b, Quantification of fluorescent 
cells in competitive proliferation assays in H23 cells transduced with 
non-targeting control (Ren) or the indicated shRNAs. Data presented 

as mean (n=2). *P< 0.05, **P< 0.01 (unpaired two-tailed t-test). 

c, Immunoblot of KRAS-mutant lung cells treated with 25 nM 

trametinib for various times. d, Immunoblot of H23 cells transduced 
with a doxycycline-inducible shRNA targeting ERK2 and treated with 
trametinib (Tram.; 25 nM) and doxycycline (Dox.) for the times shown. 
e, Immunoblot of H23 cells treated with trametinib (25 nM), SCH772984 
(500 nM), or a combination for the times shown. f, Clonogenic assay 

of H23 cells treated with trametinib, ERK inhibitor SCH772984, ora 
combination as indicated (mn =3). g, Immunoblot of KRAS-mutant lung 
cancer cells treated with 25 nM trametinib for various times. For gel source 
data, see Supplementary Fig. 1. 


the concentration of trametinib, indicating that it is MEK depend- 
ent (Extended Data Fig. 2b). Accordingly, inducible knockdown of 
ERK2, CRAF, and BRAF blocked ERK signalling rebound and reduced 
clonogenic growth after trametinib treatment (Fig. 1d and Extended 
Data Fig. 2c, d). Similar effects were observed in KRAS-mutant lung 
cancer cells treated with trametinib and the ERK inhibitor SCH772984 
(Fig. le, fand Extended Data Fig. 3)'*. These observations emphasize 
the marked dependency of KRAS-mutant tumours on the MAPK sig- 
nalling pathway. 

In agreement with other studies, KRAS-mutant cells treated with 
trametinib also displayed compensatory activation of the PI3K 
and JAK/STAT pathways as assessed by AKT and STAT3 phospho- 
rylation, respectively (Fig. 1d, e, g and Extended Data Figs 2c, 3b 
and 4a)'+!°. Although the increase in STAT3 phosphorylation was 
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transient (Extended Data Fig. 4a), AKT phosphorylation was sustained 
(Fig. 1g). In contrast to their effects on ERK signalling rebound, genetic 
or pharmacological inhibitions of MAPK signalling had little effect on 
the trametinib-induced increase in pAKT (Fig. 1d, e and Extended Data 
Figs 2c and 3b). The activation of multiple signalling pathways after 
trametinib treatment probably reflects a relief in pleiotropic feedback 
mechanisms produced by hyperactive RAS signalling in KRAS-mutant 
cells®?. 


FGFRI mediates adaptive drug resistance 

Several RTKs have been implicated in adaptive resistance to RAS 
pathway antagonist®?!!!>~*°. The identification of FGFR1 shRNAs 
as trametinib sensitizers raised the possibility that FGFR1 mediates 
MAPK and PI3K activation in trametinib-treated KRAS-mutant cells. 
In agreement, treatment of KRAS-mutant lung tumour cell lines with 
trametinib increased FGFRI receptor and/or ligand expression together 
with FGFR pathway activation as assessed by an increase in phosphoryl- 
ation of the FGFR adaptor protein FRS2 (Fig. 2a, b and Extended Data 
Figs 2b and 4b-e)*!. In turn, FGFR1 activation correlated with an 
increase in the levels of RAS-GTP, phospho-AKT, and phospho-ERK 
(Fig. 2b and Extended Data Fig. 4e), which was prevented by FGFR1 
knockdown (Fig. 2c). Accordingly, FGFR1 shRNAs did not inhibit the 
proliferation of KRAS-mutant lung cancer, but displayed synergis- 
tic inhibitory effects when combined with trametinib (Fig. 2d, e and 
Extended Data Fig. 4f, g). 

The combinatorial effects of FGFR1 inhibition and trametinib 
showed distinct specificities: for example, shRNAs targeting FGFR1 
or FRS2, but not those targeting FGFR2 and 3, sensitized KRAS- 
mutant lung cancer cells to trametinib (Fig. 2f and Extended Data 
Fig. 4h, i). By contrast, FGFR1 knockdown had little impact on tra- 
metinib sensitivity in KRAS wild-type lung cancer cells (Fig. 2g). 
While FGFR1 shRNAs synergized with trametinib in two KRAS- 
mutant pancreatic cancer cell lines, they showed little activity in 
trametinib-treated KRAS-mutant colorectal lines (Extended Data 
Fig. 5a). Importantly, this genotype and tissue specificity correlated 
with the ability of trametinib to trigger FRS2 phosphorylation when 
applied as a single agent (Extended Data Fig. 5b-d). Therefore, treat- 
ment of certain KRAS-mutant tumour types with trametinib induces 
a dependency on FGFR1 signalling that promotes adaptive drug 
resistance. 


FGFR1 inhibition enhances trametinib effects 
We next tested whether therapeutic strategies combining trametinib 
with an FGFR1 inhibitor could be effective in treating some KRAS- 
mutant lung cancers by combining trametinib with ponatinib, an 
FDA-approved multikinase inhibitor that inhibits FGFR1 and is 
being tested clinically for activity against FGFR1-amplified lung 
cancer (Extended Data Fig. 6a) (ref. 22 and https://clinicaltrials.gov/ 
show/NCT01935336). Ponatinib had little effect on KRAS-mutant 
cells but countered the trametinib-induced increase in pFRS2, pERK, 
and pAKT, and synergized with trametinib in inhibiting cell prolif- 
eration (Fig. 3a—c, and Extended Data Fig. 6b-e). As observed in 
our genetic studies, this combination also showed combined activity 
in human KRAS-mutant pancreatic cancer cells and a Kras-mutant 
murine lung adenocarcinoma line (Extended Data Fig. 7a—c), but 
to a lesser extent in KRAS wild-type lung cancer cells or KRAS- 
mutant colon cancer cells (Fig. 3c). Although it remains possible 
that the synergistic effects of this combination involve the ability of 
ponatinib to target additional kinases, similar results were observed 
with two other chemically distinct FGFR inhibitors (Extended 
Data Figs 7d, e and 8a)**". Importantly, sensitivity to the combina- 
tion of trametinib and FGFR inhibition correlated with the degree 
of pFRS2 induction after trametinib treatment (Extended Data 
Fig. 8b, c). 

While our genetic and pharmacological studies establish the impor- 
tance of the MAPK pathway in adaptive resistance to trametinib, 
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Figure 2 | Feedback activation of FGFR1 mediates adaptive resistance 
to trametinib in KRAS-mutant lung cells. a, Quantitative reverse 
transcription PCR (qRT-PCR) for FGFR1 and FGF2 in H23 cells treated 
with trametinib for the indicated times (n =3). AU, arbitrary units. 

b, Immunoblot of H23 cells treated with 25 nM of trametinib for various 
times. c, Immunoblot of H23 cells transduced with a doxycycline- 
inducible shRNA targeting FGFR1 and treated with trametinib (25 nM) 
and doxycycline for the times shown. d, Quantification of fluorescent 
cells in competitive proliferation assay in H23 and H2030 cells transduced 
with doxycycline-inducible non-targeting control (Ren) or FGFR1 
shRNAs (n= 3). e, Clonogenic assay of H23 cells transduced with 
FGFR1 and non-targeting control shRNAs, and cultured with 
dimethylsulfoxide (DMSO) or trametinib (25 nM). Relative growth of 
DMSO- (grey bars) and trametinib-treated cells (blue and red bars) 

is shown (right) (n= 3). f, g, Quantification of fluorescent cells in 
competitive proliferation assays in H23 (f) and the indicated lung cancer 
cells (g) transduced with doxycycline-inducible non-targeting control 
(Ren (Renilla)) or the indicated shRNAs (n = 3). a, Paired two-tailed 
t-test. d, f, g, Unpaired two-tailed t-test. Data presented as mean + s.d. 
*P<0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. For gel source data, 
see Supplementary Fig. 1. 


we reasoned that the compensatory increase in PI3K/AKT signalling 
also plays a role and that its inhibition by ponatinib contributes to the 
effects of this drug combination. Accordingly, PTEN knockdown, 
which can increase PI3K signalling independently of RTK activation, 
promoted partial resistance to the drug combination in KRAS-mutant 
H2030 cells. This effect was not observed in H460 cells, a KRAS-mutant 
line that also harbours an activating mutation in the p110a catalytic 
subunit of PI3K (Fig. 3d and Extended Data Fig. 9a—c). Consistent with 
a role for PI3K signalling in promoting cell survival, co-treatment of 
H23 cells with trametinib and ponatinib triggered substantial apoptosis 
in a manner that was not observed after treatment with trametinib alone 
or in combination with an ERK inhibitor (Fig. 3e and Extended Data 
Fig. 9d). Thus, the combined ability of ponatinib to impact reactivation 
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Figure 3 | Ponatinib synergizes with trametinib in inhibiting cell 
proliferation of KRAS-mutant lung cells. a, Clonogenic assay of H23 

cells treated with trametinib, ponatinib, or their combination as indicated. 
Percentage inhibition at each concentration of the drugs in H23 and H2030 
cells is presented (right). Data presented as mean of three independent 
experiments. b, Immunoblot of H23 cells treated with trametinib (25nM), 
ponatinib (750 nM), or their combination for the times shown. c, Dot 

plot illustrating the sensitivity increase to trametinib after treatment with 
ponatinib (100 nM) in a panel of KRAS-mutant (m= 15) and KRAS wild- 
type (n= 15) cancer cell lines. Data presented as mean of two independent 
replicates. d, Quantification of the relative growth of H2030 cells transduced 
with PTEN and non-targeting control shRNAs, and treated with ponatinib 
(300 nM) in combination with trametinib (1, 5, and 25 nM). Data presented 
as mean of two independent replicates. e, Quantification of AnnexinV/PI 
double positive cells in H23 cells treated with trametinib (25 nM), ponatinib 
(300 nM), SCH772984 (1|.M), or their combination for the times shown 
(n=3). f, Quantification of the relative growth of H23 cells treated with 
trametinib alone or in combination with 500 nM crizotinib, gefitinib, 
CP-724714, afatinib, or 300 nM ponatinib (n = 3). g, Immunoblot of H23 
cells pre-treated with trametinib (25 nM) for 4 days, followed by treatment 
with trametinib (25nM) alone or in combination with crizotinib (11M), 
gefitinib (11M), CP-724714 (11M), and ponatinib (750 nM) for 2 days. 

c-f, Unpaired two-tailed t-test. Error bars, mean+s.d. *P < 0.05, **P<0.01, 
*k*P < 0.001, ****P < 0.0001. For gel source data, see Supplementary Fig. 1. 
Source Data for Fig. 3 are available in the online version of the paper. 


of the MAPK and PI3K pathways contributes to its combinatorial activ- 
ity in KRAS-mutant lung cancer cells. 

We also tested whether other RTKs known to be reactivated after 
MAPK inhibition contribute to adaptive resistance to trametinib in 
KRAS-mutant lung cancer cells*?!1! 956 While trametinib treat- 
ment of H23 and H2030 cells increased MET and ERBB2 (but not 
EGER) levels (Extended Data Fig. 9e-g), inhibitors targeting these 
kinases did not synergize with trametinib under the conditions tested 
(Fig. 3f and Extended Data Figs 9h and 10a, b). Consistent with pre- 
vious reports!!, the dual EGFR/ERBB2 inhibitor afatinib also showed 
combinatorial activity with trametinib in some KRAS-mutant lung 
cancer lines, although in our hands less robustly than the trametinib 
and ponatinib combination (Fig. 3f and Extended Data Figs 9h 
and 10a, b). Accordingly, none of the agents tested prevented the 
rebound in ERK signalling after trametinib treatment (Fig. 3g and 
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Figure 4 | Suppression of FGFR1 in combination with trametinib leads 
to regression of KRAS-mutant lung tumours. a, Tumour volumes of mice 
bearing A549 and H2122 xenografts, and JHU-LX55a patient-derived 
xenograft tumours and treated with vehicle, trametinib (3 mg/kg body 
weight), ponatinib (30 mg/kg body weight), or both drugs in combination 
for the indicated times. Error bars, mean +s.e.m. (n > 6 per treatment 
group). b-d, Representative micro-computed tomography images of the 
lungs of Kras°'?”; Trp53~/~ genetically engineered mice treated with 
vehicle, trametinib (3 mg/kg body weight), ponatinib (ponat.) (30 mg/ 

kg body weight), or both drugs in combination for 3 and 7 weeks. Lung 
tumours are indicated by yellow arrows; red asterisks mark the hearts (b). 
A waterfall representation of the response for each tumour after 3 weeks of 
treatment is shown (n > 5 per group) (c). Representative haematoxylin and 
eosin stains are shown. Black asterisk indicates necrosis (d). e, Kaplan- 
Meier survival analysis of mice bearing pancreatic tumours resulting from 
orthotopic transplantation of GEMM-KPC"®*’+ PDAC organoids and 
treated as in b (n > 4 per group) (log-rank test). a, c, Unpaired two-tailed 
t-test. *P< 0.05, **P< 0.01, ***P< 0.001, ****P < 0.0001. Source Data 
for Fig. 4 are available in the online version of the paper. 


Extended Data Fig. 10c, d). Thus, reactivation of FGFR1 signalling is a 
prominent mechanism of adaptive resistance to trametinib in KRAS- 
mutant lung cancer cells. 


In vivo effects of MEK/FGFR1 inhibition 

We validated our in vitro results in KRAS-mutant lung cancer xeno- 
grafts, a KRAS-mutant patient-derived xenograft, and a genetically 
engineered mouse model (GEMM) of Kras©!??-induced lung adeno- 
carcinoma that accurately resembles the human disease’’. A549 and 
H23 xenografts harbouring tet-responsive FGFR1- or control-shRNAs 
were treated with doxycycline and a daily dose of 3 mg/kg body weight 
of trametinib when tumours reached ~150mm/*. While knockdown 
of FGFR1 or treatment with trametinib alone had only minor anti- 
tumour effects, the combination of FGFR1 knockdown with trametinib 
potently inhibited tumour growth and typically caused tumour regres- 
sion (Extended Data Fig. 11a, b). Treatment of the xenografts, PDX and 
GEMM models with vehicle, trametinib, ponatinib, or the drug com- 
bination showed similar results, with only the combination producing 
marked tumour regressions despite no apparent toxicities (Fig. 4a—c 
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Figure 5 | Trametinib induces FGFR1 signalling in KRAS-mutant lung 
tumours. a, Tumour tissue from JHU-LX55a patient-derived xenografts 
treated with vehicle, trametinib (3 mg/kg body weight), ponatinib 

(30 mg/kg body weight), or both drugs in combination for 18 days was 
evaluated by IHC for phospho-FRS2, phospho-ERK, and phospho-AKT. 
b, Paired tumour biopsies from patients having KRAS-mutant lung 
adenocarcinomas (before and after treatment with the MEK inhibitor 
trametinib) were evaluated by IHC for phospho-FRS2. 


and Extended Data Fig. 11c-e). Moreover, histological analysis of the 
residual tumour mass in GEMMs treated with the drug combination 
showed massive necrosis, an effect not seen with either agent alone 
(Fig. 4d). Similar results were observed in an organoid-based, trans- 
plantable model of Kras©@'?)-driven pancreatic cancer, in which the 
drug combination produced marked cell death and significantly 
enhanced survival (Fig. 4e and Extended Data Fig. 11f). 

We also examined the ability of trametinib to induce FGFR1 signal- 
ling in KRAS-mutant tumours. Consistent with in vitro results, a KRAS- 
mutant lung PDX model showed a concomitant increase in FRS2, ERK, 
and AKT phosphorylation after trametinib treatment—an effect that 
was cancelled by ponatinib (Fig. 5a and Extended Data Fig. 11g). 
Furthermore, FRS2 phosphorylation was dramatically increased after 
trametinib treatment in two patients with KRAS-mutant lung adeno- 
carcinoma (Fig. 5b), indicating that the mechanism of adaptive resist- 
ance identified in our preclinical models is clinically relevant. 


Discussion 

In summary, by implementing a stringent approach for negative selec- 
tion shRNA screening, we identified feedback activation of FGFR1 sig- 
nalling as a prominent mechanism of adaptive resistance to the MEK 
inhibitor trametinib in KRAS-mutant lung cancer. The mechanism was 
specific: only shRNAs targeting FGFR1, but not other FGFR family 
members or other RTKs tested, conferred trametinib sensitivity, and 
only FGFR1 inhibition blocked compensatory reactivation of both ERK 
and AKT. In agreement, an unbiased ORF screen identified FGFR1, but 
not other RTKs, as sufficient to allow proliferation of KRAS-mutant 
colon cancer cells after KRAS suppression”. In our hands, the synergis- 
tic effects of the trametinib/FGFR inhibitor combinations were largely 
restricted to KRAS-mutant lung and pancreatic cancer cells, but not 
KRAS wild-type lung or KRAS-mutant colon cancer cells. These results 
strongly associate sensitivity to the combination with the magnitude 
of FRS2 phosphorylation after trametinib treatment alone and pro- 
vide a mechanistic foothold to predict and study cell line and tumour 
variability. 
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Our results provide strong mechanistic support for combining tra- 
metinib with FGFR1 inhibitors for treating KRAS-mutant lung cancer 
and pinpoint a biomarker that might eventually be used to identify 
other patients likely to benefit from this drug combination. Although 
careful attention to additive or synergistic toxicities will be required 
for the clinical implementation of these findings, it seems likely that 
targeting a specific RTK such as FGFR1 will be more tolerable than tar- 
geting more pleiotropic factors such as AKT” and presents a rationale 
for developing more specific FGFRI1 antagonists. Regardless, our study 
provides further evidence that targeting adaptive resistance mecha- 
nisms can improve the efficacy of molecular targeted therapies and 
provides one path towards developing rational strategies for treating 
KRAS-mutant lung cancer. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Pooled negative-selection RNA interference screening. A custom shRNA 
library targeting 526 human kinases was designed using miR30-adapted DSIR 
predictions refined with ‘sensor’ rules*? (six shRNAs per gene) and constructed 
by PCR-cloning a pool of oligonucleotides synthesized on customized arrays 
(Agilent Technologies and CustomArray) as previously described (Supplementary 
Table 1)!”. The list of genes was obtained from the KinBase Database (http://kinase. 
com/human/kinome/) and was manually curated. After sequence verification, 
3,156 shRNAs (5-6 per gene) were combined with 20 positive- and negative- 
control shRNAs at equal concentrations in one pool. This pool was subcloned 
into the TRMPV-Neo vector and transduced in triplicates into Tet-on H23 KRAS- 
mutant lung cancer cells using conditions that predominantly led to a single 
retroviral integration and represented each shRNA in a calculated number of 
at least 1,000 cells. Transduced cells were selected for 6 days using 1 mg ml“! 
G418 (Invitrogen); at each passage more than 30 million transduced cells were 
maintained to preserve library representation throughout the experiment. 
After drug selection, Tp samples were obtained (~30 million cells per replicate 
(n=3)) and cells were subsequently cultured in the presence or absence of tra- 
metinib (25 nM) and 11g ml“! doxycycline to induce shRNA expression. After 
ten population doublings (Tf), about 15 million shRNA-expressing (dsRed*/ 
Venus*) cells were sorted for each replicate using a FACSAriall (BD Biosciences). 
Genomic DNA from Tp and Tf samples was isolated by two rounds of phenol 
extraction using Phase Lock tubes (5 Prime) followed by isopropanol precipita- 
tion. Deep-sequencing template libraries were generated by PCR amplification 
of shRNA guide strands as previously described’”. Libraries were analysed on 
an lumina Genome Analyzer at a final concentration of 8 pM; 50 nucleotides 
of the guide strand were sequenced using a custom primer (miR30EcoRISeq, 
TAGCCCCTTGAATTCCGAGGCAGTAGGCA). To provide a sufficient baseline 
for detecting shRNA depletion in experimental samples, we aimed to acquire >500 
reads per shRNA in the Tp sample, which required more than 20 million reads 
per sample to compensate for disparities in shRNA representation inherent to the 
pooled plasmid preparation or introduced by PCR biases. With these conditions, 
we acquired Tp baselines of >500 reads for 3,151 (97.9%) shRNAs. Sequence pro- 
cessing was performed using a customized Galaxy platform*’. 

Using selection criteria that required an shRNA depletion averaging greater 
than fourfold after ten population doublings and an effect greater than fourfold in 
trametinib-treated cells with respect to untreated ones, 64 shRNAs were identified. 
The eight targets for which at least two shRNAs were selectively depleted after tra- 
metinib treatment were subject to secondary validation in cell competition assays 
using multiple KRAS-mutant lung cancer cell lines. Six targets validated in the 
cell line in which the primary screen was performed (H23 cells) and four (BRAF, 
CRAF, ERK2, and FGFR1) across a panel of KRAS-mutant lung cancer cells; as 
such, these represented the top hits of our screen. 

Plasmids and recombinant proteins. All vectors were derived from the Murine 
Stem Cell Virus (MSCV, Clontech) retroviral vector backbone. miR30- and mirE- 
based shRNAs were designed and cloned as previously described’ and sequences 
are available in Supplementary Table 1. shRNAs were cloned into the TRMPV-Neo 
(pSIN-TREdsRed-miR30-PGK-Venus-IRES-NeoR), LT3GEPIR (TRE3G-GFP- 
miRE-PGK-PuroR-IRES-rtTA3), and MLP (LTR-miR30-PGK-PuroR-IRES-GFP) 
vectors as previously described’. All constructs were verified by sequencing. 
Recombinant proteins FGF2 (8910, Cell Signaling), HGF (100-39, Peprotech), EGF 
(AF-100-15, Peprotech), and NRG1 (100-03, Peprotech) were used at 50ng ml"? 
for 10 min. 

Cell culture, compounds, and competitive proliferation assays. H23, H460, 
H2030, H358, H2122, H820, H3255, and A549 cells were provided by R. Somwar 
and H. Varmus. H2009, HCT116, SW480, SW620, DLD-1, PaTu 8988t, 3T3, 
MIAPACA-2, and PANC-1 cells were purchased from the American Type Culture 
Collection (ATCC). H69, H82, HCC-33, and H446 were provided by C. Rudin. 
H1975, H1650, Ludlu-1, H1703, PC-14, H2170, SK-MES-1, H520, H522, EBC-1, 
HCC-15, H441, A-427, and H1299 cells were provided by M. Sanchez-Cespedes. 
Cell lines were not authenticated. Murine KRAS°!”9; p53®7H cells were derived 
from a murine lung adenocarcinoma. Cells were maintained in a humidified incu- 
bator at 37°C with 5% CO, grown in RPMI 1640 or DMEM supplemented with 
10% FBS and 100IU ml! penicillin/streptomycin. All cell lines used were negative 
for mycoplasma. 

Trametinib ($2673), SCH772984 ($7101), Gefitinib ($1025), Crizotinib ($1068), 
CP-724714 ($1167), Afatinib ($1011), BGJ398 (2183), AZD4547 ($2801), and 
Ponatinib ($1490) were obtained from Selleckchem. Drugs for in vitro studies 
were dissolved in DMSO to yield 5 or 10 mM stock solutions and stored at —80°C. 

For shRNA experiments, when necessary, human cancer cells were modified 
to express the ecotropic receptor and rtTA3 by retroviral transduction of MSCV- 
RIEP (MSCV-rtTA3-IRES-EcoR-PGK-Puro) followed by drug selection (1,.g ml“! 


puromycin for 1 week). Cell lines were transduced with ecotropically packaged 
TRMPV-Neo-shRNA retroviruses or, alternatively, with amphotropically packaged 
LT3GEPIR-Puro-shRNA lentiviruses, selected with 1 mg ml~! G418 or 1g ml? 
puromycin for 1 week, and treated with 11g ml”! doxycycline to induce shRNA 
expression. 

For competitive proliferation assays, shRNA-transduced cells were mixed 
with non-transduced cells (8:2) and cultured with doxycycline in the presence 
or absence of trametinib (25nM). The relative percentage of Venus*/dsRed* or 
GFP* cells was determined before (To, blue bars) and after ten population 
doublings (Tf) (results are relative to Tp) (Tf on doxycycline, grey bars; Tf on 
dox + trametinib, red bars). The quantification of fluorescent cells was monitored 
on a Guava Easycyte (Millipore). Experiments were performed independently 
two or three times. 

Lentiviral production. Lentiviruses were produced by co-transfection of 293T 
cells with lentiviral-Cre backbone construct and packaging and envelope vectors 
(psPAX2 and VSV-G), using the calcium phosphate method. Supernatant was 
collected 48, 60, and 72h after transfection, concentrated by ultracentrifugation 
at 24,000 r.p.m. for 120 min and resuspended in an appropriate volume of HBSS 
solution (Gibco). 

Clonogenic and apoptosis assay. For clonogenic assays, cells were seeded in trip- 
licate into six-well plates (5 x 10° to 10 x 10° cells per well) and allowed to adhere 
overnight in regular growth media. Cells were then cultured in the absence or 
presence of drug as indicated in complete media for 10-14 days. Growth media 
with or without drug was replaced every 2 days. Remaining cells were fixed with 
methanol (1%) and formaldehyde (1%), stained with 0.5% crystal violet, and pho- 
tographed using a digital scanner. Relative growth was quantified by densitometry 
after extracting crystal violet from the stained cells using 10% of acetic acid. All 
experiments were performed at least three times. Representative experiments are 
shown. 

For apoptosis assays, around 1 x 10° cells were seeded into 10-cm plates and 

cultured in the presence or absence of drugs as indicated. After 6 days, apopto- 
sis and cell death were determined using AnnexinV-APC apoptosis detection kit 
according to the manufacturer’s instructions (Affymetrix eBioscience). Data were 
acquired using a FACS Calibur (BD Biosciences). All experiments were performed 
independently three times. 
Quantitative analysis of drug synergy and determination of fold change in 
sensitivity to trametinib. Drug synergism was analysed using CompuSyn soft- 
ware (version 1.0) (http://www.combosyn.com), which is based on the medi- 
an-effect principle (Chou) and the combination index—isobologram theorem 
(Chou-Talalay)**. CompuSyn software generates combination index (CI) values, 
where CI < 0.75 indicates synergism, CI = 0.75-1.25 indicates additive effects, 
and CI > 1.25 indicates antagonism. Following the instruction of the software, 
drug combinations at non-constant ratios were used to calculate the combination 
index in our study. 

For calculating the fold change in sensitivity to trametinib, the concentration 
of trametinib that inhibited cell proliferation by 50% (Glo) was determined for a 
panel of KRAS wild-type and mutant cancer cell lines in the absence or presence 
of ponatinib and AZD4547. Experiments were performed independently twice. 
Immunoblotting and RAS-GTP assay. Phospho-lysis buffer (50 mM Tris 
pH7.5, 1% Tween-20, 200 mM NaCl, 0.2% NP-40) supplemented with phosphatase 
inhibitors (5mM sodium fluoride, 1 mm sodium orthovanadate, 1 mm sodium 
pyrophosphate, 1 mM 8-glycerophosphate), and protease inhibitors (Protease 
Inhibitor Cocktail Tablets, Roche) was used for cell lysis, and protein concentration 
was determined by a Bradford protein Assay kit (Biorad). Proteins were separated 
by SDS-PAGE and immunoblotted and transferred to polyvinyl difluoride (PVDF) 
membranes (Millipore) according to standard protocols. Membranes were immu- 
noblotted with antibodies against pERK!°”/*?4 (9101), tERK (9107), pAKTS!3 
(4060), tAKT (9272), pFRS2***° (3861), pSTAT3*””° (9145), pMEK™?”/2! (9154), 
MEK (4694), pMET*!?34/!235 (3077), MET (8198), pERBB2*!?2"/1222 (2243), pEG- 
FR*108 (3777), EGER (4267), pERBB3*!?*? (4791), and PTEN (9559) from Cell 
Signaling; CRAF (SC-227) and BRAF (SC-5284) from Santa Cruz Biotechnology; 
and KRAS (WH0003845M1) from Sigma in 5% BSA in TBS blocking buffer. 
After primary antibody incubation, membranes were probed with ECL anti- 
rabbit IgG, anti-mouse IgG or anti-goat IgG secondary antibody (1:10,000) from 
GE Healthcare Life Science and imaged using a FluorChem M system (protein sim- 
ple). GTP-bound RAS was measured using a CRAF RAS-binding-domain (RBD) 
pull down and detection kit (8821, Cell Signaling) as instructed by the manufac- 
turer. All immunoblots were performed independently at least twice. 
qRT-PCR. Total RNA was isolated using TRIZOL (Invitrogen), and complemen- 
tary DNA was obtained using the TaqMan reverse transcription reagents (Applied 
Biosystems). Real-time PCR was performed in triplicate in three independent 
experiments using SYBR Green PCR Master Mix (Applied Biosystems) on the 
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ViiA 7 Real-Time PCR System (Invitrogen). GAPDH or {-actin served as endog- 
enous normalization controls. 

Animal studies. All mouse experiments were approved by the Memorial Sloan 
Kettering Cancer Center (MSKCC) Animal Care and Use Committee (proto- 
col number 12-04-006). Mice were maintained under specific pathogen-free 
conditions, and food and water were provided ad libitum. Female 5- to 7-week-old 
athymic NCR-NU-NU (Harlan Laboratories) mice were used for animal experi- 
ments with human cell lines and patient-derived xenografts. For A549, H23, and 
H2122 xenografts, cells (10 x 10°) were harvested on the day of use and injected 
in growth-factor-reduced Matrigel/PBS (50% final concentration). One flank was 
injected subcutaneously per mouse. For JHU-LX55a patient-derived xenograft, 
a poorly differentiated lung adenocarcinoma bearing a KRAS©?© mutation, tumours 
were cut into pieces and inserted into a pocket in the subcutaneous space as pre- 
viously described**. After inoculation, mice were monitored daily, weighed twice 
weekly, and calliper measurements begun when tumours became visible. Tumour 
volume was calculated using the following formula: tumour volume = (D x d?)/2, 
in which D and d refer to the long and short tumour diameter, respectively. When 
tumours reached a size of 150-300 mm’, mice were randomized into five to eight 
per group and treated with vehicle, trametinib, and/or ponatinib per os for 4 con- 
secutive days followed by 3 days off treatment, at 3 mg/kg body weight and 30 mg/ 
kg body weight, respectively. No obvious toxicities were observed in the vehicle- or 
drug-treated animals as assessed by difference in body weight between vehicle- and 
drug-treated mice taking tumour size into account. For immunohistochemistry 
analysis of JHU-LX55a patient-derived xenograft tumours, tumours were har- 
vested 4h after dosing on day 18. 

For drug efficacy studies using a GEMM of lung cancer, Kras 
Trp53™" mice (8-12 weeks old) were anaesthetized by intraperitoneal injection of 
ketamine (80 mg per kg body weight) and xylazine (10 mg per kg body weight) and 
infected intratracheally with 2.5 x 10° infectious particles of Lenti-Cre per mouse, 
as previously described*'. Mice were evaluated by micro-computed tomography 
imaging to quantify lung tumour burden before being assigned to various treat- 
ment study cohorts. Mice were treated with vehicle, trametinib and/or ponatinib 
per os for 4 consecutive days followed by 3 days off treatment, at 3 mg/kg body 
weight and 30 mg/kg body weight, respectively. Micro-computed tomography 
imaging evaluation was repeated every week during the treatment. Investigators 
were not blinded with respect to treatment. 

For drug efficacy studies using an organoid-derived murine model of pancreatic 
cancer, spherical, duct-like organoids were derived and cultured in Matrigel and 
defined media as previously described** from pancreatic ductal adenocarcinoma 
(PDAC) occurring in Kras'S!-¢!29/+; Trp 53+; CHC (untargeted collagen homing 
cassette); RIK (Rosa26-LSL-rtTa3-IRES-Kate2); p48Cre mice (GEMM-KPC"+) 
generated via the PDAC-GEMM.-ESC approach”. After initially establishing pri- 
mary organoid cultures, Kate-positive cells were sorted and expanded to minimize 
injection of non-recombined, normal duct cells. For the orthotopic transplantation 
of PDAC organoids, mice were anaesthetized using isoflurane, and the pancreas 
was externalized through a small incision made in the left abdominal side near the 
spleen. Organoids (approximately 250,000-500,000 cells per mouse) were removed 
from Matrigel and separated into single cells by trypsinization, washed, and finally 
resuspended in 251] of Matrigel (BD) diluted 1:1 with cold PBS. The organoid 
suspension was injected into the tail region of the pancreas using 28-gauge sur- 
gical syringes (Hamilton). Successful injection was verified by the appearance of 
a fluid bubble without signs of intraperitoneal leakage. The abdominal wall was 
sutured with absorbable Vicryl suture (Ethicon), and the skin was closed with 
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wound clips (CellPoint Scientific). Mice were evaluated by ultrasound (Vevo 2100, 
VisualSonics) to quantify pancreas tumour burden before being randomized to 
various treatment study cohorts. All the treatment mice had similar initial tumour 
burden. Mice were treated as described above for drug efficacy studies using a 
GEMM of lung cancer. Investigators were not blinded with respect to treatment. 
Micro-computed tomography imaging. Micro-computed tomography scans were 
performed on a Mediso Nano SPECT/CT System covering only the lung fields of 
each mouse. Each scan averaged approximately 6 min using 240 projections with 
an exposure time of 1,000 ms set at a pitch of 1°. The tube energy of the X-ray was 
55 kVp and 145,1A. The in-plane voxel sizes chosen were small and thin creating 
a voxel size of 73 jum x 73|1m x 731m. The final reconstructed image consisted of 
368 voxels x 368 voxels x 1,897 voxels. Scans were analysed with Osirix software. 
Patients’ samples. Patients with KRAS mutation-positive advanced lung adenocar- 
cinomas were enrolled in the phase I/II clinical study of trametinib and navitoclax 
(NCT02079740) and the response was assessed per RECIST (response evaluation 
criteria in solid tumours) criteria. Biopsies were obtained before treatment, and 
within 2-4 weeks after starting the treatment with trametinib. Specifically, for 
patient 1, the post-treatment biopsy was obtained after treatment with navitoclax 
for 7 days, followed by co-treatment with navitoclax and trametinib for 16 days. 
The post-treatment biopsy from patient 2 was obtained after co-treatment with 
navitoclax and trametinib for 22 days. All human studies were approved by the 
Massachusetts General Hospital Institutional Review Board, and informed consent 
to study was obtained as per protocol from all patients. 

Immunohistochemistry. Tissues were fixed overnight in 4% paraformaldehyde, 
embedded in paraffin, and cut into 5|1m sections. Sections were subjected 
to haematoxylin and eosin staining, and immunohistochemical staining 
following standard protocols. The following primary antibodies were used: 
pERK?07/¥204 (4370) and pAKT473 (4060) (Cell signaling), and pFRS2**°° 
(ab193363) (Abcam). 

Statistical analysis. Data are expressed as mean +s.e.m. or mean +s.d. Group 
size was determined on the basis of the results of preliminary experiments and no 
statistical method was used to predetermine sample size. The indicated sample 
size (n) represents biological replicates. Group allocation and outcome assessment 
were not performed in a blinded manner. All samples that met proper experi- 
mental conditions were included in the analysis. Survival was measured using 
the Kaplan-Meier method. Statistical significance was determined by Student's 
t-test, log-rank test, and Pearson's correlation using Prism 6 software (GraphPad 
Software). Significance was set at P< 0.05. 
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Extended Data Figure 1 | A synthetic lethal RNA interference screen replicates. e, Immunoblot analysis of KRAS-mutant lung cancer cell lines 
identifies different MAPK signalling effectors and FGFRI as sensitizers treated with 25 nM of trametinib for 48 h. f, g, Scatter plots illustrating 
to MEK inhibition in KRAS-mutant lung cancer cells. a, Library the correlation of normalized reads per shRNA between replicates at the 
features and schematic of the TRMPV-Neo vector. b, Schematic outline beginning of the experiment (d) and replicates at different time points in 
of the synthetic lethal RNA interference screen for identifying sensitizers the absence (left) or presence (right) of trametinib (25 nM) (e). h, Scatter 
to trametinib in KRAS-mutant lung cancer cells. c, Clonogenic assay of plot illustrating the fold change in the relative abundance of each shRNA in 
KRAS-mutant lung cancer cell lines (H23, H460, and H2030) cultured in the library after ten population doublings on doxycycline in the absence or 
the presence of increasing concentrations of trametinib. d, Proliferation presence of trametinib (25 nM) in H23 cells. Two shRNAs for FGFR1, CRAF, 
assay of H23 and H2030 cells in the presence of increasing concentrations BRAF, and ERK2 were identified as selectively depleted in trametinib- 


of trametinib for four passages. Data presented as mean of two independent treated cells. For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 2 | Suppression of FGFR1 and different MAPK 
signalling effectors reduces the proliferation and viability of KRAS- 
mutant lung cancer cells treated with trametinib. a, Quantification of 
fluorescent cells in competitive proliferation assays in H2030 (upper) and 
A549 (lower) cells transduced with non-targeting control (Ren) or the 
indicated shRNAs. Data presented as mean (n = 2). Unpaired two-tailed 
t-test. *P < 0.05, **P <0.01. b, Immunoblot of H23 and H2030 cells pre- 
treated with 25 nM trametinib for various times and subsequently treated 
with 200 nM trametinib for 2h. c, Immunoblot of H23 cells transduced 
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with doxycycline-inducible shRNAs targeting CRAF and BRAF and treated 
with trametinib (25 nM) and doxycycline for the times shown. H23 
cells were pre-treated with trametinib for 4 days, followed by treatment 
with doxycycline and trametinib for 4 days. d, Clonogenic assay of H23 
cells transduced with BRAF, CRAF, ERK2, and non-targeting control 
shRNAs, and cultured with DMSO or trametinib (25 nM) for 10 days. 
Relative growth of DMSO- (grey bars) and trametinib-treated cells 
(blue and red bars) is shown (right). Data presented as mean + s.d. (n= 3). 
For gel source data, see Supplementary Fig. 1. 
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concentration of the drugs in H23, H2030, and H460 cells is presented of increasing concentrations of SCH772984 (bottom). For gel source 
(right). Data presented as mean of three independent experiments (n=3). data, see Supplementary Fig. 1. Source Data for Extended Data Fig. 3 are 
b, Immunoblot analysis of H2030 cells treated with trametinib (25 nM), available in the online version of the paper. 
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to adaptive resistance to trametinib in KRAS-mutant lung cancer cells. 
a, Immunoblot analysis of KRAS-mutant lung cancer cell lines H23 and 
H2030 treated with 25 nM trametinib for various times. b-d, qRT-PCR for 
FGFR1 and FGF2 in A549 (b), H2030 (c), and H460 (d) cells treated with 
trametinib for the indicated times. Data presented as mean normalized 

for FGFRI and FGF2 expression + s.d. (n = 3). e, Immunoblot analysis of 
A549, H2030, and H358 cells treated with trametinib (25 nM) for various 
times. f, Quantification of fluorescent cells in competitive proliferation 
assays in A549, H358, and H460 cells transduced with doxycycline- 
inducible non-targeting control (Ren) or FGFR1 shRNAs. Data presented 


with non-targeting control and FGFR1 shRNAs. Data presented as 

mean normalized for FGFR1 expression +s.d. (n =3). h, Quantification 
of fluorescent cells in competitive proliferation assays in A549 cells 
transduced with non-targeting control (Ren) or the indicated shRNAs. 
Data presented as mean +s.d. (n =3). i, (RT-PCR for FGFR2, FGFR3, 
and FRS2 in A549 cells transduced with non-targeting control, FGFR2, 
FGFR3, and FRS2 shRNAs. Data presented as mean normalized for 
FGFR2, FGFR3, and FRS2 expression + s.d. (n= 3). b-d, Paired two-tailed 
t-test. f-i, Unpaired two-tailed t-test. *P < 0.05, **P < 0.01, ***P< 0.001, 
**** P < (0.0001. For gel source data, see Supplementary Fig. 1. 
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synergizes at inhibiting cell proliferation of KRAS-mutant lung cancer 
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analysis for the indicated antibodies is shown. b, Immunoblot analysis of 
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combination for the times shown. Cells were pre-treated with trametinib for 
4 days, followed by co-treatment with ponatinib and trametinib for 2 days. 
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Extended Data Figure 8 | The magnitude of trametinib-induced between fold increase in sensitivity to trametinib after treatment with 
FRS2 phosphorylation correlates with the sensitivity to trametinib AZD4547 (2.5 1M) or ponatinib (100 nM) and fold change in FRS2 
and FGFR1 combined inhibition in human cancer cells. a, Dot plot phosphorylation after trametinib treatment in a panel of human cancer 
illustrating the sensitivity increase to trametinib after the treatment cells lines. c, Immunoblot analysis of a panel of human cancer cells 
with AZD4547 (2.541M) in a panel of KRAS-mutant (m= 15) and KRAS treated with trametinib (25 nM) for 6 days. a, Unpaired two-tailed 
wild-type (m= 15) cancer cell lines. Data presented as mean of two t-test. b, Two-tailed Pearson’s correlation. **P < 0.01, ***P < 0.001, 
independent replicates (n = 2). b, Scatter plot illustrating the correlation *** P < (0001. For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 9 | Ponatinib prevents trametinib-induced 
reactivation of MAPK and PI3K signalling. Upregulation of distinct 
RTKs in KRAS-mutant lung cancer cells after trametinib treatment. 

a, Immunoblot analysis of H2030 transduced with PTEN and non- 
targeting control shRNAs, and treated with trametinib (25 nM) for the 
times shown. b, Clonogenic assay of H2030 (left) and H460 (middle) cells 
transduced with PTEN and non-targeting control shRNAs. Cells were 
treated with ponatinib alone (300 nM) or in combination with trametinib 
at the indicated concentrations. Quantification of the relative cell growth 
of H460 cells is shown (right). Data presented as mean of two independent 
experiments. c, Immunoblot analysis of H2030 transduced with PTEN and 
non-targeting control shRNAs, and treated with trametinib (25 nM) alone 
or in combination with ponatinib (750 nM) for the times shown. PTEN 
suppression did not affect ERK signalling or its inhibition after trametinib 


ARTICLE 


treatment but instead activated AKT and, more importantly, attenuated 
the ability of ponatinib to suppress trametinib-induced increase in pAKT. 
d, AnnexinV/PI double staining assay of H23 cells treated with vehicle, 
trametinib (25 nM) alone or in combination with ponatinib (300 nM) or 
SCH772984 (11M) for the times shown (n = 3). e, f, GRT-PCR for EGFR, 
MET, and ERBB2 in H23 (e) and H2030 (f) cells treated with trametinib 
for 0, 2, and 4 days. Data presented as mean normalized for EGFR, MET, 
and ERBB2 expression + s.d. (n =3). g, Immunoblot analysis of H23 cells 
treated with 25 nM of trametinib for various times. h, Immunoblot 
analysis of serum starved H2030 cells pre-treated with 500 nM or 114M 
of gefitinib, crizotinib, CP-724714, or afatinib for 12h, followed by 
stimulation with EGF, HGF, NRGI, or their combination (50 ng ml!) 
for 10 min. b, e, f, Unpaired two-tailed t-test. *P < 0.05, **P< 0.01, 

*#* P< Q.001. For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 10 | Unresponsiveness of KRAS-mutant lung at the indicated concentrations. Each CI score represents data from at least 
cancer cells to MEK inhibitor trametinib is predominantly mediated by —_ two independent experiments (n = 2). c, Immunoblot of H23 and H2030 
feedback activation of FGFR1 signalling. a, Clonogenic assay of H23 and _ treated with trametinib (25 nM), crizotinib (11M), gefitinib (11M), 


H2030 cells treated with increasing concentration of trametinib alone or CP-724714 (11M), and ponatinib (750 nM) for 48h. d, Immunoblot 

in combination with 500 nM crizotinib, gefitinib, CP-724714, and afatinib, analysis of H2030 treated with trametinib (25 nM), crizotinib (11M), 

or 300nM ponatinib. Percentage inhibition at each concentration of the gefitinib (1 4M), CP-724714 (111M), ponatinib (750 nM), or their 

drugs in H23, H460, and H2030 cells is presented (right). Data presented combination for the times shown. Cells were pre-treated with trametinib 
as mean of at least two independent experiments (n=2).b, CI (combination _ for 4 days, followed by co-treatment with RTK inhibitors and trametinib 
index) scores for H23, H460, and H23 cells treated with trametinib in for 2 days. For gel source data, see Supplementary Fig. 1. Source Data for 


combination with crizotinib, gefitinib, CP-724714, afatinib, and ponatinib Extended Data Fig. 10 are available in the online version of this paper. 
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Extended Data Figure 11 | See next page for caption. 
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Extended Data Figure 11 | Suppression of FGFR1 cooperates with 
trametinib to inhibit growth of KRAS-mutant lung tumours. a, b, Mice 
bearing H23 (a) or H2030 (b) xenografts transduced with FGFRI or non- 
targeting control shRNAs were treated with either vehicle or trametinib 

(3 mg/kg body weight). For H23 xenografts, a waterfall representation 

of the best response for each tumour is shown (n =8 per group) (a). For 
H2030 xenografts, the tumour volumes are shown as a function of time 
after treatment. Error bars, mean + s.e.m. (n > 4 per group) (b). c, Mice 
bearing A549 and H2122 xenografts, and JHU-LX55a patient-derived 
xenograft tumours were treated with vehicle, trametinib (3 mg/kg body 
weight), ponatinib (30 mg/kg body weight), or both drugs in combination. 
A waterfall representation of the best response for each tumour is shown 
(n > 6 per group). d, Body weight of mice bearing A549 xenografts and 
treated with vehicle, trametinib (3 mg/kg body weight), ponatinib (30 mg/kg 
body weight), or both drugs in combination for the indicated times 

(n > 6 per group). e, Kras®!??; Trp53~/~ genetically engineered mice 


harbouring lung adenocarcinomas were treated with vehicle, trametinib 
(3 mg/kg body weight), ponatinib (30 mg/kg body weight), or both drugs 
in combination for 7 weeks. A waterfall representation of the response for 
each tumour after 7 weeks of treatment is shown ( > 5). f, Representative 
haematoxylin and eosin stains of pancreatic tumour tissue resulting from 
orthotopic transplantation of GEMM-KPC"’* PDAC organoids. Mice 
were treated with vehicle, trametinib (3 mg/kg body weight), ponatinib 
(30 mg/kg body weight), or both drugs in combination. Black asterisk 
indicates necrosis. g, Immunoblot analysis of tumour tissue from mice 
bearing JHU-LX55a patient-derived xenografts treated with vehicle, 
trametinib (3 mg/kg body weight), ponatinib (30 mg/kg body weight), or 
both drugs in combination for 18 days. a-c, e, Unpaired two-tailed 

t-test. *P < 0.05, **P< 0.01, ***P < 0.001, ****P < 0.0001. For gel source 
data, see Supplementary Fig. 1. Source Data for Extended Data Fig. 11 are 
available in the online version of this paper. 
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The landscape of accessible chromatin in 


mammalian preimplantation embryos 


Jingyi Wu!?*, Bo Huang**, He Chen‘, Qiangzong Yin!, Yang Liu?, Yunlong Xiang!, Bingjie Zhang!, Bofeng Liu', Qiujun Wang}, 
Weikun Xia!, Wenzhi Li®, Yuanyuan Lil, Jing Ma!, Xu Peng’, Hui Zheng, Jia Ming®, Wenhao Zhang", Jing Zhang®, Geng Tian”, 
Feng Xu’!°, Zai Chang®, Jie Na°, Xuerui Yang? & Wei Xie? 


In mammals, extensive chromatin reorganization is essential for reprogramming terminally committed gametes to a 
totipotent state during preimplantation development. However, the global chromatin landscape and its dynamics in this 
period remain unexplored. Here we report a genome-wide map of accessible chromatin in mouse preimplantation embryos 
using an improved assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) approach 
with CRISPR/Cas9-assisted mitochondrial DNA depletion. We show that despite extensive parental asymmetry in DNA 
methylomes, the chromatin accessibility between the parental genomes is globally comparable after major zygotic genome 
activation (ZGA). Accessible chromatin in early embryos is widely shaped by transposable elements and overlaps extensively 
with putative cis-regulatory sequences. Unexpectedly, accessible chromatin is also found near the transcription end 
sites of active genes. By integrating the maps of cis-regulatory elements and single-cell transcriptomes, we construct the 
regulatory network of early development, which helps to identify the key modulators for lineage specification. Finally, 
we find that the activities of cis-regulatory elements and their associated open chromatin diminished before major ZGA. 
Surprisingly, we observed many loci showing non-canonical, large open chromatin domains over the entire transcribed 
units in minor ZGA, supporting the presence of an unusually permissive chromatin state. Together, these data reveal a 


unique spatiotemporal chromatin configuration that accompanies early mammalian development. 


The state of chromatin dictates fundamental cellular processes includ- 
ing gene expression, DNA replication and DNA repair’. It has long 
been observed that accessible chromatin marks regulatory sequences 
such as promoters, enhancers, insulators and the locus-control 
regions”. These elements interact with cell-type specific transcription 
factors to execute transcriptional programs that instruct cell fate deter- 
mination and development’. Early animal embryos undergo extensive 
reprogramming and chromatin remodelling to allow the conver- 
sion of terminally differentiated gametes to totipotent/pluripotent 
cells*. However, it remains to be determined how mammalian chro- 
matin reconfigures and regulates the transcription programs in pre- 
implantation development. It is also unclear whether the dynamics 
of chromatin are associated with other epigenomic reprogramming 
events such as global DNA demethylation®. Moreover, the roles of 
cis-regulatory elements in early development are poorly understood. 
Unfortunately, these questions are difficult to address owing to the 
limited experimental materials available. Recently, the highly efficient 
approach ATAC-seq (an assay for transposase-accessible chromatin 
using sequencing) was developed, which can probe open chromatin 
using as little as single cells*®. Accessible regions detected by ATAC- 
seq colocalize extensively with regulatory elements such as enhancers 
and promoters®™!", Here, we use an improved ATAC-seq approach to 
interrogate genome-wide accessible regions in mouse preimplantation 
embryos. As described below, our integrative analyses provide a com- 
prehensive view of the spatiotemporal chromatin configuration that 
accompanies early embryogenesis. 


Mapping accessible chromatin in early embryos 

We sought to investigate accessible chromatin in mouse preimplantation 
embryos using ATAC-seq (Fig. 1a). First, we confirmed that ATAC-seq 
using 200 and 1,000 mouse embryonic stem cells (mESCs) recapitu- 
lated results using 50,000 cells (Fig. 1b and Extended Data Fig. 1a, b). 
Next, we collected gametes and preimplantation embryos by crossing 
C57BL/6N female mice with the DBA/2N male mice. We obtained 
poor ATAC-seq enrichment in sperm, metaphase II oocytes and pro- 
nuclear stage 5 zygotes despite multiple attempts (data not shown). 
For the 2-cell, 4-cell, 8-cell embryos and inner cell masses (ICMs), 
we carried out ATAC-seq for two biological replicates. One obstacle 
for profiling accessible chromatin in preimplantation embryos is the 
abundant mitochondrial DNA (mtDNA) in the ATAC-seq sequencing 
libraries (Extended Data Fig. 1c). Therefore, we developed a method 
to deplete mtDNA from the ATAC-seq libraries, named CRISPR/ 
Cas9-assisted removal of mitochondrial DNA (CARM) (Fig. la 
and Extended Data Fig. 1d). Using a pool of single guide RNAs (sgR- 
NAs) targeting mtDNA, CARM substantially increased the percentages 
of nuclear DNA (Extended Data Fig. Ic, e, f) without affecting ATAC- 
seq enrichment patterns (Extended Data Fig. 1g). A similar strategy was 
used to deplete ribosomal RNA-derived sequencing reads in a previ- 
ous study'!. Using ATAC-seq coupled by CARM, we obtained 40-67 
million monoclonal reads (replicates combined) uniquely mapped to 
the nuclear genome for each stage (Supplementary Table 1). We also 
performed and validated RNA-seq for each stage using Smart-seq” 
(Extended Data Fig. 2a). 
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Figure 1 | Accessible chromatin landscape in mouse preimplantation 
embryos. a, Schematic of ATAC-seq and CARM for probing accessible 
chromatin in mouse preimplantation embryos. b, The UCSC browser view 
shows enrichment of ATAC-seq in early embryos. c, The overlap between 
ATAC-seq peaks and annotated promoters (TSS + 0.5 kb) or distal DHSs 
(away from TSS + 0.5 kb) in mESCs. A random set of peaks that match 

the lengths of individual ATAC-seq peaks on the same chromosomes 


We conducted several analyses to validate the ATAC-seq data in early 
embryos. First, biological replicates of ATAC-seq showed highly repro- 
ducible results (Fig. 1b and Extended Data Fig. 1a and 2b). Second, a 
large fraction of ATAC-seq peaks in early embryos overlapped with 
promoters and mESC distal DNase I hypersensitive sites (DHSs) 
(Fig. 1c). Third, for developmentally regulated genes, such as Pou5f1, 
Nanog and Foxal, we found elevated ATAC-seq enrichment at anno- 
tated or putative enhancers and promoters at the same stages when they 
are expressed (Fig. 1d). Fourth, the activation of genes at various stages 
was generally correlated with increased levels of promoter ATAC-seq 
signals (Fig. le). Consistent with previous work’, the correlation is 
more evident for promoters with low CG densities (Extended Data 
Fig. 2c). As RNA transcripts can also be inherited from oocytes, we only 
analysed genes activated in preimplantation embryos but not expressed 
in oocytes (denoted hereafter as “ZGA-only genes’). Finally, we inves- 
tigated histone modifications using a low-input chromatin immuno- 
precipitation (ChIP) followed by high-throughput DNA sequencing 
(ChIP-seq) method that we recently developed (Methods; manuscript 
submitted) to validate the ATAC-seq data. In the 2-cell embryos, the 
ATAC-seq enrichment showed similar distributions with the active 
mark H3K27ac but not with the repressive mark H3K27me3 (Extended 
Data Fig. 2d, e). Taken together, these data suggest that ATAC-seq 
provides a highly sensitive method to examine open chromatin and 
cis-regulatory element activities in preimplantation embryos. 


=a ey scale apie 


were used as a control. d, The UCSC browser views showing the ATAC- 
seq, RNA-seq and DNasel-seq (mESC, ENCODE) enrichment near 
Pou5f1, Nanog, and Foxa1. e, Heat maps show the expression (FPKM) 
of stage-specific genes (ZGA-only) (left) and the ATAC-seq enrichment 
(normalized reads per kilobase per million (RPKM) (middle) at their 
promoters (TSS + 2.5 kb). Example genes are also listed (right). 


Allelic landscape of open chromatin 

To determine the global chromatin state on each parental genome, 
we measured allelic ATAC-seq enrichment using single-nucle- 
otide polymorphisms (SNPs) present between the two parental 
strains (Fig. 2a). We interrogated regions covered with sufficient 
reads of which the allele origins were identified (>8 reads per 1 kb 
window; Methods). These regions overlapped with 38-39% of the 
total ATAC-seq peaks at each stage. Among them, a large fraction 
(82-88%) did not show significant allelic bias for ATAC-seq enrich- 
ment (false discovery rate (FDR) < 2%; Fig. 2b and Extended Data 
Fig. 3a, b). Consistently, we found that the majority of ZGA-only 
genes are biallelically transcribed in pooled embryos (Extended Data 
Fig. 3b). Conversely, we also observed allele-specific ATAC-seq signals 
in early embryos (Extended Data Fig. 3c). To determine whether 
this reflects differences in parent-of-origins, sequences or stochastic 
events, we chose one stage (4-cell) and conducted ATAC-seq anal- 
ysis in the reciprocal cross by mating male C57BL/6N and female 
DBA/2N mice (with two replicates). Among all ATAC-seq peaks 
identified as allele-specific in at least one cross, the majority (72%) 
did not show allele-specific enrichment in the other cross (‘stochastic’) 
(Fig. 2c). About 22% showed preferential enrichment on the allele 
from one strain (‘sequence dependent’). Finally, only 6% showed 
parental-dependent allele-specific open chromatin (‘parent-of-origin’) 
(Fig. 2c). Taken together, about 5.3% of all SNP-covered ATAC-seq 
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Figure 2 | Allele-specific accessible chromatin landscape in mouse early 
embryos. a, The UCSC browser view shows allelic ATAC-seq enrichment 
and DNA methylation. mCG/CG data from ref. 16. b, Bar charts showing 
the percentages of ATAC-seq peaks that are biallelic, paternal- or 
maternal-specific. Only peaks covered by sufficient SNPs and allelic reads 
were considered (Methods). ¢, Pie charts show the percentages of peaks 
(covered by SNPs) with biallelic or allele-specific accessible chromatin 

at the 4-cell stage. d, The average ATAC-seq enrichment and DNA 
methylation levels on each allele around all promoter and distal ATAC-seq 
peaks (covered by SNPs). 


peaks were reproducibly identified as allele-specific peaks (including 
1.1% for parent-of-origin-dependent and 4.2% for sequence- 
dependent; Supplementary Table 2). Regions on the X-chromosomes 
tended to show lower paternal enrichment (Extended Data Fig. 3d), 
which is in part contributed to by the lower numbers of paternal chro- 
mosomes in the pooled embryo samples (X,/X,, in females + X,,/Y in 
males = 2X,:1X,). Other parent-of-origin-dependent allele-specific 
peaks include those near known imprinted gene Surpn and a novel 
imprinted gene Etv6, a transcription factor that is essential for embry- 
onic development! (Extended Data Fig. 3c). Globally, genes with 
allele-specific expression are preferentially located near allele-specific 
ATAC-seq peaks (Extended Data Fig. 3e). These data suggest that the 
two parental genomes showed comparable accessible chromatin and 
transcription activities after ZGA at the bulk levels, with a small subset 
of regions showing allele-specific open chromatin and transcription. 

Allele-specific expression of imprinted genes is controlled by differ- 
ential DNA methylation). Therefore, we asked if allele-specific ATAC- 
seq peaks are associated with allelic DNA methylation. A global view 
showed that the comparable allelic landscape of accessible chromatin 
is in stark contrast to the distinct allelic DNA methylomes previously 
reported'© (Fig. 2a). For both promoter and distal ATAC-seq peaks, 
DNA methylation shows preferential methylation on the maternal 
allele (Fig. 2d). This is true for biallelic, maternal-specific, and paternal- 
specific peaks (Extended Data Fig. 4a) and still holds when con- 
sidering the demethylation products such as 5-formylcytosine and 
5-hydroxymethylcytosine'® (Extended Data Fig. 4b). These results are 
consistent with the higher level of global DNA methylation on the 
maternal genome at the 2-cell stage’®. The differences in allelic DNA 
methylation at ATAC-seq peaks gradually decrease when development 
proceeds, as both alleles are progressively demethylated'® (Fig. 2d). 
By contrast, imprinted regions maintain allele-specific DNA methy]l- 
ation throughout early development!® (Extended Data Fig. 4c). These 
data indicate that in early embryos, maternal and paternal genomes 
showed largely comparable open chromatin landscapes and zygotic 
transcriptomes despite the presence of widespread allele-specific DNA 
methylation. 
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Figure 3 | Unique characteristics of accessible chromatin in mouse 
early embryos. a, The average ATAC-seq enrichment (normalized) for 
the top 1,000 most active genes (ZGA-only genes) at each stage. b, The 
UCSC browser view shows ATAC-seq enrichment near two active genes 
(Rps19 and Rps13) in embryos and somatic tissues (data from refs 9, 17). 
ATAC-seq peaks near TSSs and TESs are shaded. c, Enrichment of repeats 
in ATAC-seq promoter and distal peaks compared to that in random peaks 
for early embryos and somatic tissues. The enrichment was calculated as 
a log, ratio for the numbers of observed peaks that overlap with repeats 
divided by the numbers for random peaks. d, Heat maps show the 
enrichment (calculated similarly as in c) of repeat subfamily in ATAC-seq 
peaks in early embryos and mESCs. 


Accessible chromatin at TESs and repeats 

Next, we investigated whether the accessible chromatin landscape in 
preimplantation embryos is different from those at late developmental 
stages or in somatic cells. Surprisingly, we observed unusually strong 
ATAC-seq signals just downstream of transcription end sites (TESs) in 
2-cell embryos (Fig. 3a). Such enrichment decreases at late stages and 
is much weaker in somatic cells”!” (Fig. 3b). The open chromatin at 
TESs in early embryos preferentially occurs at active genes (Extended 
Data Fig. 5a, b). The TES open chromatin may reflect the binding of 
factors engaged in transcription termination’®. Alternatively, these sites 
may function as enhancers that promote high levels of transcription 
for housekeeping genes at the onset of ZGA"”. These data revealed that 
open chromatin in early embryos is found both at promoters and near 
TESs of active genes. 

In addition, we found distal ATAC-seq peaks in early embryos 
are also distinct from cis-regulatory elements in somatic tissues” 
(Extended Data Fig. 6a). Notably, certain classes of repeats are highly 
transcribed in preimplantation embryos”! and repetitive elements can 
function as promoters and enhancers”»”’. We confirmed that the 2-cell 
stage embryos contain the highest fraction of transcripts from repeats 
among all stages (Extended Data Fig. 6b). In notable contrast to all of 
the somatic tissues that we examined, both promoter and distal ATAC- 
seq peaks showed preferential enrichment for repeats at the 2-cell stage 
(Fig. 3c, d), with several subfamilies of short interspersed nuclear ele- 
ments (SINEs) (B1, B2, and B4) and endogenous retroviral elements 
(ERVL) most enriched. Indeed, the repeat-associated peaks at the 2-cell 
stage probably contributed to the increased total number of ATAC- 
seq peaks (Extended Data Fig. 6c). In support of a functional role for 
these transposable elements”', the transcription start sites (TSSs) of 
2-cell specific genes are more likely to contain repeats particularly for 
long terminal repeats (LTRs) and SINEs (Extended Data Fig. 6d-f). 
As validation, repetitive elements are also enriched in H3K27ac peaks 
in the 2-cell embryos (Extended Data Fig. 6g). Notably, none of the 
transposable elements that we examined (B1, B2, MERVL, and B4) were 
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Figure 4 | Identification of candidate regulators for early development. 
a, TF motifs identified from distal ATAC-seq peaks. Only TFs expressed 
at least at one stage (FPKM > 5) and motif enrichment P value <1 x 107'° 
at least at one stage were included. b, Schematic of the Gata4 knockdown 
experiments. A Venn diagram shows the overlap between Gata4- 
knockdown downregulated genes and ICM-specific genes (compared 

to mESCs), with the P value (hypergeometric distribution) indicated. 

c, Bar charts showing the relative gene expression with the GFP 
knockdown (KD) embryos normalized to 1. Error bars denote the standard 
errors of single-embryo RNA-seq FPKM values (n= 3 for GFP KD; n=4 
for Gata4 KD). d, Key regulators for ICMs and TEs identified by MARINa. 
In each row, all genes were sorted (from left to right) by their differential 
expressions in ICM versus TE cells. Predicted positively and negatively 
regulated TF targets are marked as red or blue bars on the basis of their 
co-expression patterns with TFs across individual cells in blastocysts. 

The P values (FDR-corrected) represent the statistical significance of 
enrichment estimated by permutating the ICM and TE samples. 

e, Heat maps show ICM to TE expression ratio (top) and Nr5a2 to GFP 
knockdown expression ratio in the 8-cell embryos (bottom). 


enriched near TESs (Extended Data Fig. 6h), suggesting that the open 
chromatin near TESs is unlikely to be caused by repeats. In sum, these 
data suggest that the accessible chromatin landscape in early embryos 
is extensively shaped by transposable elements. 


Regulatory network in early development 

As enhancers are known to be hotspots of transcription factor (TF) 
binding”, we asked whether the distal ATAC-seq peaks harbour 
motifs for TFs regulating preimplantation development. We confirmed 
that distal ATAC-seq peaks are highly stage-specific (Extended Data 
Fig. 7a). The GREAT analysis, which annotates non-coding regions 
by analysing their nearby genes”*, showed that early stage-specific dis- 
tal peaks are frequently located near genes functioning in chromatin 
regulation (Extended Data Fig. 7b). Consistently, chromatin regu- 
lator genes are generally upregulated at early stages (2-8 cell stages) 
(Extended Data Fig. 7c). In contrast, distal peaks in ICMs and mESCs 
are preferentially associated with genes involved in development 
(Extended Data Fig. 7b). Next, using motif analysis software HOMER” 
we found the binding motifs for a set of transcription factors enriched 
in distal peaks in a highly stage-specific manner (Fig. 4a). Notably, 
the majority of these transcription factors, including CTCK NR5A2, 
TEAD4, GATA4, POUS5F1, SOX2 and NANOG, are essential for early 
development. Importantly, the timing of the appearance of the motifs 
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coincides with the expression of their corresponding TFs. For example, 
both the binding motifs and the expression of NR5A2 and RARG are 
strongly enriched at the 2-8-cell stages (Fig. 4a). A similar observation 
was made for FOXA1 (4-8-cell stages), POU5F1, SOX2 and NANOG 
(known regulators of ICMs and mESCs). Interestingly, the GATA fac- 
tors are strongly enriched in distal peaks in ICMs but not in mESCs 
(Fig. 4a) (the motifs for individual GATA family members are highly 
similar, data not shown). Whereas mESCs are typically derived from 
ICMs, it remains unknown to what extent their chromatin states truly 
resemble each other. We found that ICMs and mESCs showed distinct 
landscapes for distal ATAC-seq peaks (Extended Data Fig. 7b) and 
significant differences in gene expression (Extended Data Fig. 7d, e). 
As ICMs can give rise to primitive endoderm (PE) and epiblast® but 
mESCs most resemble preimplantation epiblast?’, we speculate that 
genes expressed in ICMs, but not in mESCs, may preferentially enrich 
for PE-fate genes. As GATA4, an essential regulator for embryonic devel- 
opment, is a marker of PE?’ and is not expressed in mESCs (Fig. 4a), 
we tested whether the downregulation of GATA4 may promote the 
conversion of the transcriptome of ICMs towards mESCs. We injected 
Gata4 siRNA in zygotes and analysed the effect on blastocysts using 
RNA-seq (Fig. 4b). Indeed, genes downregulated by Gata4 knockdown 
are strongly enriched for ICM-specific genes (Fig. 4b) (overlap gene 
n=50, Pvalue=2.6 x 10~°), but not for mESC-specific genes (overlap 
gene n= 24, P value =0.68). The expression of PE marker genes Sox7, 
Sox17 and Pdgfra, but not the epiblast markers Nanog and Pou5f1, was 
substantially reduced (Fig. 4c). The expression of Gata6, which func- 
tions upstream of Gata4 (ref. 29), remained unchanged. Interestingly, 
Sox2 is also downregulated upon Gata4 knockdown (Fig. 4c). 
As SOX2 is known to promote PE lineage*’, we speculate that GATA4 
may also in turn regulate the expression of Sox2. These data suggest 
that GATA4 is a regulator of ICM circuitry and its downregulation may 
have a role in resetting the transcription programs of ICMs to mESCs. 
An even earlier lineage specification event before PE and epiblast 
differentiation is the segregation of ICM and trophectoderm (TE)°. 
To determine which factors regulate ICM and TE segregation, we 
performed RNA-seq in isolated ICMs and TEs. Interestingly, the 
known master regulators of ICM (such as Pou5f1 and Nanog) and 
TE (such as Cdx2 and Eomes) are not among the most differentially 
expressed genes (Extended Data Fig. 8a). This is possibly because 
these key regulators are expressed before lineage segregation and 
their transcripts are only asymmetrically distributed between TEs 
and ICMs*!. In contrast, genes exclusively expressed in ICMs but 
not in TEs are often expressed at later stages (Extended Data Fig. 8a), 
indicating that they are probably downstream effectors. To iden- 
tify possible upstream transcription factors, we employed MARINa 
(MAster Regulator INference algorithm)*’, which identifies candidate 
key TFs by investigating whether the expression of TF target genes 
(instead of TFs themselves) are enriched in cell-state-specific tran- 
scription programs. To identify the possible target genes for TFs iden- 
tified in our motif analyses, we predicted promoter-enhancer (distal 
peak) pairs with correlated activities (ATAC-seq enrichment) across 
development stages as previously described** (Extended Data Fig. 8b). 
The promoter-enhancer pairs were validated by high levels of corre- 
lation between corresponding gene expression and ATAC-seq signal 
on distal peaks (Extended Data Fig. 8c, d). We then searched for the 
TF motifs in each distal peak and linked the connected promoter 
and gene to the TFs as their targets (Extended Data Fig. 8b). The TF 
targets were further separated into positively and negatively regu- 
lated genes based on their co-expression patterns with the TFs, using 
single-cell RNA-seq data in blastocysts!*. By doing so, MARINa 
correctly identified known ICM markers POU5F1, NANOG, SOX2, 
ESRRB and KLF4, as well as the TE regulators GATA3 and TEAD4 
(Fig. 4d). As controls, factors such as CTCE GABPA and TFAP2C were 
not identified as either ICM or TE regulators. We focussed on NR5A2, 
which is known to promote the expression of Pou5f1 and Nanog 
and the knockout of Nr5a2 leads to early embryo death at E6.5-7.5 
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(refs 34, 35). Interestingly, Nr5a2 is highly expressed at the 2-8 cell 
stages (Fig. 4a and Extended Data Fig. 9a), raising the possibility that 
it may be an early regulator of ICM/TE programs. In support of this, 
analysis of individual single-cell RNA-seq data!” showed that in the 
8-cell embryos, the expression of Nr5a2 is positively correlated with 
the ICM marker genes including Pou5f, Sall4 and Tdgf1, and is neg- 
atively correlated with the TE marker genes such as Cdx2 (Extended 
Data Fig. 9b). This pattern was even more evident in blastocysts. We 
then knocked down Nr5a2 in zygotes and collected embryos at the 
8-cell stage for RNA-seq analysis (Extended Data Fig. 9c). Indeed, 
ICM marker genes including Nanog, Pou5f, Tdgfl and Tdh were 
downregulated (Fig. 4e). By contrast, TE marker genes such as Id2, 
Gata3, Fabp3 and Krt8 were upregulated. TFs that are expressed in 
both ICM and TE such as Gabpa, Ctcf and Sp1 were not affected. 
Therefore, these data indicate that NR5A2 regulates the expression 
of key TFs for ICM and TE as early as the 8-cell stage. Notably, the 
8-cell-specific genes are also preferentially downregulated upon 
depletion of Nr5a2 (Extended Data Fig. 9c). Taken together, by inte- 
grating the information about cis-regulatory elements and the tran- 
scriptome, we uncovered a regulatory network for preimplantation 
development that is orchestrated by a set of transcription factors and 
their targets. 


A unique chromatin state in minor ZGA 

Interestingly, whereas cis-regulatory elements are crucial for gene 
regulation, previous studies suggested that enhancers may be dispen- 
sable for transcription of reporters in zygotes*®. Instead, enhancers 
only become essential during the course of ZGA when chromatin is 
proposed to progressively adopt a repressive state*”. However, these 
experiments were largely based on exogenous reporters. To investi- 
gate the chromatin state before major ZGA in vivo, we examined early 
2-cell embryos in which minor ZGA is most evident but preceding 
major ZGA**. Our RNA-seq analysis found only a few genes (n=98, 
fragments per kilobase of transcript per million mapped reads (FPKM) 
> 5) actively transcribed at this stage anda large fraction (48%) of them 
(such as Zscan4) reside in clusters in the genome (Extended Data 
Fig. 10a—c). Consistent with limited gene activities in early 2-cell 
embryos, we obtained generally weak and noisy ATAC-seq enrich- 
ment, and the detected peaks are reduced for both number (Extended 
Data Fig. 10d) and genome coverage (Extended Data Fig. 10e) com- 
pared to those at the 2-cell stage. These peaks are enriched for repet- 
itive elements with classes similar to those found in 2-cell embryos 
(Extended Data Fig. 10f). Interestingly, when searching for regions 
(100 kb bin) with strongest ATAC-seq enrichment in the genome, we 
discovered that a large fraction of them (52% of the top 100 regions) 
contain or are in the proximity (100 kb) of the full-length MERVLs. 
These full-length MERVLs have relatively intact open reading frames 
and are usually highly transcribed (data not shown). Notably, MERVLs 
are the most highly transcribed repeats in the 2-cell embryos where 
they may initiate over 300 genes*!~®. The expression of MERVLs was 
reported to be restricted in a very short time window at the early 2-cell 
stage*®, although our RNA-seq data indicate that their transcripts may 
be accumulated and retained at later stages (Extended Data Fig. 10g). 
Surprisingly, the open chromatin near MERVL is large in size (up to 
117 kb; median length, 40 kb) and is specifically present at the 3' down- 
stream of active MERVLs (Fig. 5a, b, Extended Data Fig. 10h). For 
brevity, we termed these domains as ‘minor-ZGA-associated accessi- 
ble chromatin domains’ (MAC domains). Interestingly, promiscuous 
transcription lacking 3’ processing or splicing is reported to occur spe- 
cifically in minor ZGA*. We found that the MAC domains coincide 
with strong unterminated transcripts of MERVLs that invade the 3’ 
downstream regions (Fig. 5a—c). The MAC domains near MERVLs 
were absent in either 2-cell embryos or the early 2-cell embryo treated 
with alpha-amanitin, which blocks transcription (Fig. 5a, b). Finally, 
we found that non-repeat early 2-cell genes are also preferentially asso- 
ciated with broad ATAC-seq domains, although at weaker levels than 
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Figure 5 | Transcription regulation and chromatin state in minor ZGA. 
a, The UCSC genome browser views show the promiscuous transcription*! 
and broad accessible chromatin domains downstream of MERVL or 
Zfp352. Zygote + DRB, zygote treated with transcription inhibitor 

DRB; APH, aphidicolin; alpha, alpha-amanitin. b, The average ATAC- 
seq enrichment around MERVLs. The MERVL gene itself is not shown 
owing to its highly repetitive nature and the resulting low mappability 
(the upstream and downstream regions of MERVL are mappable). c, The 
average promiscuous transcription levels around MERVLs measured 

by total RNA-seq*!. d, A model shows the distinct transcription and 
chromatin states in pre-ZGA, minor ZGA and major ZGA. 


those near MERVLs (Extended Data Fig. 10i). These include Zfp352, 
which is the most highly expressed non-repeat gene in our data (Fig. 5a 
and Extended Data Fig. 10j). Taken together, our study showed unique 
chromatin landscape in minor ZGA that features broad domains of 
open chromatin covering promiscuous transcription. 


Discussion 

Here we provide a genome-wide survey of accessible chromatin in 
the mouse preimplantation embryos using ATAC-seq. A fundamen- 
tal question in preimplantation development is to what extent gene 
expression is linked to epigenome reprogramming. Our data indicate 
that gene activation and establishment of open chromatin could occur, 
at least in part, through different pathways from those for epigenetic 
modification reprogramming. One possible reason for the differences 
is that epigenetic modifications can be inherited from oocyte and 
sperm. The open chromatin, in contrast, could be newly established 
when global transcription takes place. Another surprising finding is the 
unique chromatin state in the minor ZGA featured by broad domains of 
accessible chromatin over extended promiscuous transcripts (Fig. 5d). 
Such broad accessible chromatin is in notable contrast to open chro- 
matin at short regulatory elements and supports a globally permissive 
chromatin state in minor ZGA“!. The relaxed chromatin environ- 
ment may contribute to the pervasive transcription from repeats and 
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the ultimate integration of these elements in the host program in 
evolution®’. On the other hand, such promiscuous transcription and 
globally permissive chromatin may be detrimental for major ZGA. 
The establishment of repressive chromatin together with stage- 
specific regulatory elements such as promoters and enhancers prob- 
ably ensures accurate control of the zygotic transcription program. 
Intriguingly, transposable elements and the TESs of active genes also 
show strong ATAC-seq enrichment after major ZGA, particularly at 
early stages. These findings indicate additional gene regulatory modules 
in preimplantation embryos. We postulate that the TES accessible 
chromatin may be associated with regulators actively engaged at TESs 
to prevent promiscuous transcription, as observed in minor ZGA}. 
Alternatively, it may promote efficient and potent transcription by 
forming loops with the promoters! for housekeeping genes. Taken 
together, our data not only unveiled the chromatin landscapes in minor 
and major ZGAs, but also allowed genome-wide identification of reg- 
ulatory circuitry in early development. Further investigations with 
additional complementary approaches are warranted to fully dissect 
epigenomic reprogramming during preimplantation embryogenesis. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during outcome assessment. 

Embryo collection. All analysed embryos were collected from 5- to 6-week-old 
C57BL/6N females mated with DBA/2N males (Vital River). For reciprocal experi- 
ments, C57BL/6N males and DBA/2N females were used. To induce ovulation, 
females were administered 5 IU of hCG intraperitoneally, 44—48 h after injection 
of 5 IU of PMSG (San-Sheng pharmaceutical Co. Ltd). Each set of embryos at a 
particular stage was flushed from the reproductive tract at defined time periods 
after hCG administration: 30h (early 2-cell), 39-43 h (2-cell), 54-56h (4-cell), 
68-70 h (8-cell) and 92-94h (blastocysts) in HEPES-buffered CZB medium. To 
inhibit transcription in the minor ZGA, early zygotes (PN3) were cultured in CZB 
supplemented with alpha-amanitin for about 14h. Embryos were selected by cell 
number or morphology, with their zona pellucida gently removed by treatment of 
10 IU ml" pronase (Sigma P8811) for several minutes. Blastocysts were incubated 
ina 1:3 dilution of anti-mouse rabbit serum in DMEM medium for 20 min, washed 
in PBS and further incubated for 20 min in a 1:5 dilution of rat serum in DMEM 
for the complement reaction. The ICM was subsequently cleaned from lysed tro- 
phectoderm with a narrow glass pipette. The resulting cells were manually picked 
and treated with the lysis buffer for ATAC-seq or Smart-seq2. 

All animal maintenance and experimental procedures were carried out accord- 
ing to guidelines of Institutional Animal Care and Use Committee (IACUC) of 
Tsinghua University, Beijing, China. 

Cell culture. mESCs (R1) were cultured on gelatin in DMEM containing 15% FBS, 
leukaemia inhibiting factor, penicillin/streptomycin, L-glutamine, 3-mercapto- 
ethanol, and non-essential amino acids. 

ATAC-seq library preparation and sequencing. The ATAC-seq libraries of 
mESCs and early mouse embryos were prepared as previously described with 
minor modifications®. Briefly, samples were lysed in lysis buffer (10 mM Tris-HCl 
(pH 7.4), 10mM NaCl, 3mM MgCl and NP-40) for 10 min on ice to prepare the 
nuclei. The optimized concentration of NP-40 is 0.15% for mESCs, the 2-cell, 
4-cell, 8-cell embryos and 0.5% for early 2-cell and ICMs. Immediately after lysis, 
nuclei were spun at 500g for 5 min to remove the supernatant. Nuclei were then 
incubated with the Tn5 transposome and tagmentation buffer at 37°C for 30 min 
(Vazyme Biotech). After the tagmentation, the stop buffer was added directly 
into the reaction to end the tagmentation. PCR was performed to amplify the 
library for 15 cycles using the following PCR conditions: 72°C for 3 min; 98°C 
for 30s; and thermocycling at 98°C for 15s, 60°C for 30s and 72°C for 3 min; 
following by 72°C 5 min. After the PCR reaction, libraries were purified with 
the 1.2x AMPure (Beckman) beads before proceeding for mitochondrial DNA 
depletion. 

CRISPR/Cas9-assisted removal of mitochondrial DNA (CARM). The NGG 
protospacer adjacent motif (PAM) sequences were first identified in the genome 
of mitochondria, and the upstream 20 bp of PAM sequences was used as sgRNA 
candidates. A total of 114 sgRNAs were then selected and synthesized to cover 
approximately every 140 bp over the mitochondrial genome. Paired oligonucleo- 
tides were annealed, pooled, and were ligated to the pUC57kan-T7-gRNA vector, 
which were further transformed and amplified. In vitro transcription was per- 
formed to produce sgRNAs (MEGAshortscriptTM Kit, Thermo Fisher Scientific). 
Each ATAC-seq library was incubated with 330-500 ng sgRNA and 1 jg Cas9 
protein (PNA Bio CP01-50) for 2h at 37°C. After incubation, the reaction was 
treated by RNaseA before being terminated by adding the stop buffer (30% glycerol, 
1.2% SDS, 250mM EDTA, pH 8.0). The ATAC-seq library was further purified 
by 1.2x AMPure beads and was subjected to sequencing on HiSeq1500 or 2500 
(Illumina) according to the manufacturer's instruction. 

RNA-seq library preparation and sequencing. The RNA-seq libraries were gener- 
ated from early mouse embryos and ES cells using Smart-seq2 (ref. 42) with minor 
modification. Cells were lysed in hypotonic lysis buffer (Amresco, M334), and the 
polyadenylated mRNAs were captured by the PolyT primers. After ~3-10 min 
lysis at 72 °C, the Smart-seq2 reverse transcription reactions were performed. 
After pre-amplification and AMPure XP beads purification, cDNAs were sheared 
by Covaris and were subject to Illumina TruSeq library preparation. All libraries 
were sequenced on Illumina HiSeq1500 or 2500 according to the manufacturer's 
instruction. 

ChIP-seq library preparation and sequencing. To investigate histone modifica- 
tions in early embryos, we developed a low-input ChIP-seq method (manuscript 
in submission). Briefly, each sample is lysed and subjected to MNase digestion at 
37°C. The reaction is terminated by adding stop buffer (110 mM Tris-HCl (pH 8.0), 
55mM EDTA) and cold 2x RIPA buffer. After spinning at max speed, the super- 
natant is transferred to a new tube. Each chromatin sample is supplemented with 
RIPA buffer and is incubated with antibodies for H3K27ac (Active Motif 39133) 


or H3K27me3 (Diagenode pAb-069-050) (overnight at 4°C). The next day, the 
sample is incubated with protein A dynabeads (Life Technologies) for 2h with 
rotation at 4°C. The beads are washed with RIPA buffer and LiCl buffer. After 
washing, tubes are spun briefly and the supernatant is removed. Beads are resus- 
pended with ddH,0O and Ex-Taq buffer (TaKaRa). Proteinase K (Roche) is then 
added to elute DNA from beads. The supernatant is then transferred to a new 
tube and the proteinase K is heat inactivated. The end of DNA is repaired by rSAP 
(NEB) followed by heat inactivation. The resulting sample is subjected to library 
preparation as previously described’. 

Gene knockdown in mouse embryos. Fertilized C57BL/6J zygotes for siRNA 
microinjection were collected from the oviducts of female mice at 18h after 
injection with hCG. All siRNA oligonucleotides were synthesized with 5’-CY3 
and 2’-O-Me modification. The siRNA sequences were as follows: Gata4-Mus- 
1748: 5'-GUCCCAGACAUUCAGUACUTT-3/; Gata4-Mus- 1244: 5’-GGCAG 
AGAGUGUGUCAAUUTT-3’; Gata4-Mus-2811: 5’-GGUUGUGUCUAC 
AGCACAATT-3’; Nr5a2-Mus: 5’-GCAAGTGTCTAATTTAAA-3’. An injection 
pipette containing 20 mM siRNA solution was inserted into the cytoplasm of the 
zygotes. About 40 zygotes were microinjected and further cultured up to the 8-cell 
(Nr5a2) or blastocyst stage (Gata4) in CZB medium containing glutamine at 37°C 
under 5% CQ} in air. 

ATAC-seq data processing. All ATAC-seq reads were first aligned to the genomes 
of the C57BL and DBA strains separately using Bowtie2 (version 2.2.2). SNP tables 
for DBA/2J and C57BL/6N were downloaded from the Sanger Institute Mouse 
Genome Project’. The DBA/2] and C57BL/6N genomes were generated by substi- 
tuting corresponding bases from the mm9 genome. As we used the DBA/2N strain 
instead of the DBA/2] strain, we verified the identity of this strain by sequencing its 
genome. The genomes of DBA/2N and DBA/2J are highly similar as 99.4% of SNPs 
identified in the DBA/2N strain (compared to the reference genome) are the same 
as those found in the DBA/2J strain identified by the Sanger Institute (data not 
shown). Therefore, we used the DBA/2J genome (which has a deeper sequencing 
depth) for all subsequent analyses. The single-end ATAC-seq reads were aligned 
with the parameters -t -q -N 1 -L 25 and the paired-end ATAC-seq reads were 
aligned with the parameters: -t -q -N 1 -L 25 -X 2000 no-mixed no-discordant. 
ChIP-seq reads were aligned to mm9 with the parameters -t -q -N 1 -L 25. All 
unmapped reads, non-uniquely mapped reads and PCR duplicates were removed. 
For downstream analysis, we normalized the read counts by computing the 
numbers of reads per kilobase of bin per million of reads (RPKM). To visualize the 
ATAC-seq signal in the UCSC genome browser, we extended each read by 250 bp 
and counted the coverage for each base. 

RNA-seq data processing. All RNA-seq data were mapped to the mm9 genome 
by STAR (version 2.4.0)*°, which was shown to be highly effective in mapping 
RNA-seq reads containing SNPs!*. The RNA-seq data for mESCs and ICMs 
from Tang et al.4° were downloaded and mapped similarly. The gene expression 
level was calculated by Cufflinks (version 2.2.1) using the refFlat database from 
the UCSC genome browser. To quantify the expression level of total repeats in 
mouse preimplantation embryos, the reads were mapped to RepBase by Bowtie2 
(version 2.2.2) and only uniquely mapped reads were kept for further analysis. To 
obtain reliable reads counts for each family of repeats (MERVL), reads mapped 
to mm9 were counted as RPKM on the basis of the locations of annotated repeats 
(RepeatMasker) downloaded from the UCSC genome browser. 

Allele assignment of sequencing reads. Allele assignment of sequencing reads 
was conducted as described previously*” and we validated our allelic analyses using 
the sequenced genomic DNA (cortex) extracted from the C57BL/6N and DBA/2N 
strains used in this study, which showed that 99.6% and 98.9% reads were correctly 
assigned, respectively. 

The comparison between ATAC-seq replicates. The correlation between dif- 
ferent ATAC-seq replicates was calculated as following: each read was extended 
250 bp from the mapped end position and the RPKM value was generated on a 
100-bp-window base. The ATAC-seq enrichment was then summed within each 
5-kb-window for the entire genome and was compared across different samples. 
Pearson correlation was used for all analyses. 

Identification of promoter and distal ATAC-seq peaks. All the ATAC-seq peaks 
were called by MACS with the parameters nolambda -nomodel. ATAC-seq peaks 
that are at least 2.5 kb away from annotated promoters (RefSeq, Ensemble and 
UCSC Known Gene databases combined) were selected as distal ATAC-seq peaks. 
Comparison of ATAC-seq peaks and known cis-regulatory elements or 
H3K27ac ChIP-seq peaks. To compare the ATAC-seq peaks identified in early 
embryos with the annotated cis-regulatory elements, we calculated the overlap 
between the ATAC-seq peaks of different stages and + 0.5 kb around the TSSs of 
annotated promoters (RefSeq, Ensemble and UCSC Known Gene). Non-promoter 
(distal) peaks were then compared to distal DHSs in mESCs (ENCODE) or 2-cell 
H3K27ac ChIP-seq peaks called by MACS. Random peaks were generated by 
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selecting random regions in the genome with the sizes matching each individual 
ATAC-seq peak. 

Identification of stage-specific genes. A Shannon-entropy-based method was 
used to identify stage-specific genes, as previously described*. Due to the pos- 
sible confounding effects from maternally inherited RNA transcripts, ZGA-only 
genes were analysed, which were defined as those not expressed in oocytes (FPKM 
< 0.5) but that are activated (FPKM>1) after ZGA (in either the 2-cell, 4-cell, 
8-cell embryos or ICMs). Genes with entropy score less than 2 were selected as 
candidates for stage-specific genes. Among these genes, we selected candidates 
of stage-specific genes for each stage based on the following criteria: the gene is 
highly expressed at this stage (FPKM > 3), and such high expression cannot be 
observed in more than two additional stages. These genes were then reported in 
the final stage specific gene lists. 

Identification of ICM and mESC specific genes. ICM- and mESC-specific 
genes were identified as follows: a minimal of twofold change in expression lev- 
els (FPKM) between ICMs and mESCs was required. Genes with low expression 
(FPKM <1) in both ICMs and mESCs were removed. Those that fulfil such cri- 
teria in both our dataset and that of Tang et al.*° were selected as the final list of 
ICM- and mESC-specific genes. 

Analysis of the promoter ATAC-seq enrichment. To ensure that the promoters 
that we analysed reflect the truly active promoters (as many genes have alternative 
promoters), we only included genes that contain a single promoter, or multiple 
promoters that are located within 500 bp of each other, in which case a promoter 
would be randomly chosen. The ATAC-seq signals at promoters (-£2.5 kb) were 
computed across all promoters in the genome. 

Identification of allele-specific accessible chromatin regions. To identify the 
allele-specific accessible chromatin regions, the mouse genome was divided into 
consecutive 1-kb-bins and only those bins covered by at least 8 reads of which 
the parental origins were determined by SNPs were considered. The significance 
of allele bias for each bin was then assessed via the binomial test and the allele 
score (AS score) was defined as —log)o(P value). To estimate the FDR for all reads 
that contain informative SNPs, we permuted their parental origins by randomly 
assigning them to two arbitrary alleles. A similar random AS score (R-AS score) 
for the allele bias was computed for each region using the permutated data set. The 
permutation was performed five times and the average R-AS scores were used to 
assess the global FDR under an AS score cutoff. Allele-specific accessible chromatin 
regions were identified using a cutoff of AS score 3 (absolute value, corresponding 
to a P value of 0.001), resulting in a less than 2% FDR for all the stages. 
Identification of allele-specific genes and ATAC-seq peaks. To identify genes that 
are expressed in an allele-specific manner, allelic reads mapped to either mater- 
nal or paternal allele were identified as described above. Allelic reads mapped to 
exons for each gene were counted by Htseq-count"*. Only genes with FPKM > 1 
and are covered by at least 30 allelic reads were kept for downstream analysis. 
The allele-specific genes were identified by at least threefold change between the 
numbers of maternal and paternal reads and a minimal of AS score of 3 (similarly 
computed as for the ATAC-seq data). ATAC-seq peaks called by MACS were iden- 
tified as allele-specific ones if they overlap with 1 kb allele-specific region with at 
least 50% of the length of the peak. Biallelic ATAC-seq peaks were identified if they 
overlap with the non-allele-specific 1 kb region covered by SNPs. 

DNA methylation analysis at ATAC-seq peaks. The average DNA methylation 
levels within the defined ATAC-seq peak regions, as well as 7.5 kb of their upstream 
and downstream regions, were computed. The MethylC-seq data (which measure 
5mC + 5hmC), fCAB-seq data (5fC + 5mC), TAB-seq data (ShmC) were down- 
loaded from a previous study"®. 

Gene ontology analysis. The DAVID web-tool was used to identify the GO 
terms”. 

The comparison between ATAC-seq peaks and repetitive elements. To iden- 
tify the overlap between repetitive elements and promoter or distal ATAC-seq 
peaks, the ATAC-seq peaks were compared with the locations of annotated repeats 
(RepeatMasker) downloaded from the UCSC genome browser. As repeats of dif- 
ferent classes vary greatly in numbers, a random set of peaks with identical lengths 
of ATAC-seq peaks was used for the same analysis as a control. The numbers of 
observed peaks that overlap with repeats were compared to the numbers of random 
peaks that overlap with repeats, and a log ratio value was generated as the ‘observed 
to expected’ enrichment. 

Comparison of stage specificity between promoter and distal ATAC-seq peaks. 
To compare the stage-specificity of distal ATAC-seq peaks and promoter ATAC-seq 
peaks across early developmental stages, the Shannon entropy score for normalized 
RPKM for each peak was calculated across five stages and cell types. Weak ATAC- 
seq peaks (RPKM <0.5 at any stages) were removed from this analysis. 
Comparison between distal ATAC-seq peaks and somatic tissue enhancers. To 
compare the distal ATAC-seq peaks identified in early embryos with the annotated 
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somatic tissue enhancers”, the overlap between the distal ATAC-seq peaks in early 
mouse embryos and somatic tissue enhancer data sets were calculated. 
Identification of stage-specific distal ATAC-seq peaks. The distal ATAC-seq 
peaks from embryos of all stages and mESCs were combined, with overlapped 
peaks merged. The average RPKM values were calculated for these distal ATAC-seq 
peaks which were further normalized by the Z-score normalization. A Shannon- 
entropy-based method™ was used to identify stage-specific distal ATAC-seq peaks. 
We selected those with entropy less than 2 as candidates for stage specific distal 
ATAC-seq peaks. The stage-specific distal ATAC-seq peaks were further defined 
based on the following criteria: the distal ATAC-seq peak is highly active in this 
stage (normalized RPKM >1), and its activity (normalized RPKM >0) cannot be 
observed at more than two additional stages. The resulting distal ATAC peaks were 
then reported in the final stage-specific distal ATAC-seq peak list. The functional 
enrichment for genes that are near stage-specific distal ATAC-seq peaks was 
analysed using the GREAT tool by default settings”. 

Prediction of promoter targets of putative enhancers. To identify the potential 
targeted genes for each putative enhancer (distal peaks), we computed the averaged 
ATAC-seq enrichment (normalized RPKM) for all distal ATAC-seq peaks and 
annotated promoters (TSS + 2.5 kb). We calculated the correlation across the 5 cell 
types and stages between the ATAC-seq enrichment at distal ATAC peaks and each 
promoter within 100 kb. The promoter with a Pearson correlation coefficient of 
more than 0.8 was selected as the potential target of the enhancer. As a control, for 
each promoter-enhancer pair, we permutated promoters and putative enhancers 
pair and generated a random promoter-enhancer pair of equal distance. 
Connecting TFs to target genes through distal peak-promoter interactions. To 
find the sequence motif enriched in distal ATAC-seq peaks, findMotifsGenome.pl 
from the HOMER program” was used. AnnotatePeaks.pl was then used to identify 
specific peaks that contain certain motifs. To connect TFs to genes at a particular 
developmental stage, we first identified TF motifs that are present in distal ATAC- 
seq peaks and selected those that are highly enriched (P value <1 x 107 1°). We 
then assigned these TF motifs to genes by the distal-promoter peak pairs estab- 
lished in this work. If multiple putative enhancers were assigned to the same pro- 
moter, the numbers of TF motifs within these enhancers assigned to a gene were 
then summed. Ifa gene receives no assignment of motifs for a TF from its linked 
enhancers, the number of TF motif for this gene is 0. 

Inference of master regulators between ICM and TE with MARINa. Master 
regulator analysis was performed using MARINa™. Specifically, we first identified 
a total of 11 representative ICM and 10 representative TE single cells with high 
levels of Pou5f1 transcripts but low Cdx2 expression, or high Cdx2 but low Pouf5 
expression, respectively, using a published single-cell transcriptome data set!?. 
These single cells were considered as replicates of ICM and TE for identifying 
ICM- and TE-enriched genes with consideration of their expression variations. 
Next, for the candidate TFs identified in ICM by HOMER, their target genes 
were predicted based on TF motif enrichment in promoter-interacting enhancer 
regions as described above. The positively regulated (or negatively regulated) 
targets were defined as those genes of which the expression shows positive (or 
negative) correlation with that of the TF across all individual cells from blastocysts. 
For each TE, MARINa then assesses enrichment of the predicted targets in the 
differentially expressed genes between ICM and TE. If the positively regulated and 
negatively regulated targets are enriched for ICM-specific genes and TE-specific 
genes, respectively, the corresponding TF is defined as a candidate ICM regulator. 
Conversely, if the positively regulated and negatively regulated targets are enriched 
for TE-specific genes and ICM-specific genes, respectively, the TF is defined as a 
candidate TE regulator. Statistical significance of such enrichment was estimated by 
comparing the enrichment score with those from the same analysis but after sample 
permutations for 10,000 times. TFs with FDR-corrected P values <0.01 were con- 
sidered significant and thereby inferred as master regulators between ICM and TE. 
Selection of downregulated genes for the Gata4 and Nr5a2 knockdown 
embryos. Genes that showed twofold downregulation upon Gata4 knockdown 
(average of four single embryos) compared to GFP knockdown (average of three 
single embryos) or upon Nr5a2 knockdown (average of four single embryos) 
compared to GFP knockdown (average of four single embryos) were selected. 
Genes with low expression in both GFP and knockdown embryos (FPKM < 1) 
were removed. 

Analysis of total RNA-seq data. Total RNA-seq data were downloaded from Abe 
et al.!'. A RPKM file was generated using 100-bp-bin for each data set and was vis- 
ualized in the UCSC genome browser. To specifically investigate the promiscuous 
intergenic transcription, exonic transcripts (usually RPKM >2) from genes need 
to be excluded, including those inherited from oocytes and those derived from 
genes activated after the major ZGA. Therefore, we excluded bins with RPKM 
values above 2 from each total RNA-seq data and the resulting data were used for 
downstream analysis. 
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Identification of the broad domains of open chromatin in the early 2-cell 
embryos. To identify all possible broad domains of open chromatin in the genome, 
we merged peaks within 50 kb and those with sizes more than 10 kb were identified 


as broad domains. 
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Extended Data Figure 1 | Development of CARM, a method to deplete 
mitochondrial DNA from ATAC-seq library. a, A snapshot of the UCSC 
browser view shows enrichment of ATAC-seq (with replicates) in early 
embryos and mESCs, as well as the enrichment of DNase-seq in mESCs. 
b, Scatter plots comparing the ATAC-seq enrichment (RPKM, 5-kb- 
window for the entire genome) between samples using various numbers 
of mESCs. The Pearson correlation coefficients are shown. c, Bar charts 
showing the average percentages of monoclonal nuclear DNA reads in 
ATAC-seq sequencing libraries before and after mitochondrial DNA 
(mtDNA) depletion for early mouse embryos at each developmental 
stage. d, Schematic of mtDNA depletion for the ATAC-seq library. 
mtDNA in ATAC-seq library was digested by Cas9 with sgRNAs targeting 
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mtDNA. These digested mtDNA cannot form sequencing clusters during 
sequencing due to the lack of adaptors on both ends. e, Bar chart shows the 
mtDNA depletion efficiency for samples of different stages. The depletion 
efficiency refers to the percentages of total mtDNA reads that were 
removed by CARM. f, The DNA gel showing the Cas9 digestion efficiency 
for mtDNA. A total of sixteen 1 kb mtDNA amplicons covering the entire 
mitochondrial genome were pooled and subjected to Cas9 digestion 
together with mtDNA sgRNAs. g, Scatter plots showing the correlation of 
ATAC-seq enrichment for the 2-cell embryos between pre-depletion and 
post-depletion. A similar analysis was performed between sequencing and 
re-sequencing results for the same library as a control. 
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Extended Data Figure 2 | Validation of ATAC-seq and RNA-seq data in different promoter CG densities. Genes specifically expressed at the 2-cell 
early mouse embryos. a, The Spearman correlation between the replicates stage were analysed. d, A snapshot of the UCSC genome browser shows the 
of RNA-seq samples and between RNA-seq in this study and Deng et al.'* global view of ATAC-seq, H3K27ac and H3K27me3 ChIP-seq enrichment 


at stages when available. b, Scatter plots comparing the enrichment (5-kb in the 2-cell embryos. e, The overlaps between ATAC-seq peaks of early 
window for the entire genome) between ATAC-seq replicates for mouse embryos at each stage and H3K27ac ChIP-seq peaks in the 2-cell embryos 
preimplantation embryos. The Pearson correlation of the RPKM values is for promoter peaks (left) or distal peaks (right). The overlaps of random 
shown. ¢, Box plots comparing the levels of gene expression and promoter promoters or peaks and H3K27ac ChIP-seq peaks in the 2-cell embryos 
ATAC-seq enrichment in the 2-cell embryos and ICM for genes with are included as controls. 
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Extended Data Figure 3 | Allelic accessible chromatin in early mouse 
embryos. a, The percentages of regions (covered by at least 8 reads with 
assigned parental origins) defined as allele specific regions (blue) at various 
P-value cutoffs are plotted. As a control, these reads were randomly assigned 
to either allele and a similar percentage was calculated (red, average of 5 
random permutations). A P value of 0.001 was selected as the cutoff to give 
the FDR (random to observed; green) less than 2% for each stage. b, Scatter 
plots showing the numbers of ATAC-seq reads (top) assigned to each allele 
in SNP-containing regions and RNA-seq reads assigned to each allele for 
SNP-containing genes (middle and bottom). For RNA-seq analysis, either all 
genes (middle) or ZGA-only genes (bottom) were used. The purple denotes 
allele-specific ATAC-seq regions or allele-specific genes. The red, yellow and 
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green denote biallelic ATAC-seq regions or biallelic genes, with green to red 
showing high to low point densities. The numbers show the percentages of 
maternal- and paternal-specific ATAC-seq regions or genes. c, Snapshots 
show the allelic ATAC-seq signals and RNA-seq signals near the imprinted 
locus Snrpn and near the gene Etv6. Note, not all regions are covered by 
SNPs. d, Box plots showing the ratio of paternal versus maternal read 
numbers for all regions in the genome (1-kb window) covered by at least 8 
reads with assigned parental origins. The analysis was performed for either 
autosomes (blue) or the X chromosome (red). e, Bar charts showing the 
percentages of allele specific genes (combining all stages) that contain allele 
specific ATAC-seq peaks at the same stages in nearby regions (within 40 kb). 
A P value based on the hypergeometric distribution is shown. 
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a ATAC-seg enrichment and DNA methylation at ATAC-seq peaks 
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Extended Data Figure 4 | Allelic ATAC-seq enrichment and DNA peaks (right) in the 2-cell embryos. The fCAB-seq and TAB-seq data 
methylation levels at ATAC-seq peaks. a, The allelic average ATAC-seq were obtained from a previous study'®. Notably, MethylC-seq measures 
enrichment and DNA methylation levels around biallelic, maternal- and the sum of 5mC and 5hmC. As the levels of 5hmC on both alleles are much 
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measures 5hmC, are shown for both promoter peaks (left) and distal 
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Extended Data Figure 5 | Analyses of ATAC-seq peaks at transcription end sites. a, The average levels of ATAC-seq enrichment for genes of different 
expression levels (ZGA-only) in the 2-cell embryos. b, The Gene Ontology analysis results for the top 1,000 genes ranked by TES ATAC-seq enrichment 
(average enrichment from TES to 2 kb downstream) in the 2-cell embryos. 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a__ Distal ATAC-seq peaks overlapping b Expression of repeats c ATAC-seq peaks vs. repeats 
with somatic enhancers 


n 60000 
8 6% g 
g 80% e & 40000 
3 60% 2 4% 5 
o . 
5 a & 20000 
= 40% 2 € 
S = 2% = 
2 20% 6 a 0 
[= 
0% 2 ay 2-cell 4-cell 8-cell ICM 
2-cell 4-cell 8-cell ICM & 2S SSN L m No overlap with repeats 
FoF oF OY mA lap with repeat 
ae & ny overlap with repeats 
= > 20% peak overlap with repeats 
d chr9:18,208,300-18,218,103 chr1:9,997,200-10 ,010,774 chr1:5,471,191-5,480,587 
2-cell Sie: -— ——— 
4-cell Meal — 
ATAC-se 8-cell ons Messer a. - aa 
q ICM ie " q 
mESC fg... _ _ — iw oe 3 oe 
DNase-seq mESC ____ ee a — a — 
2cel Sf | i eee =— Be vo 
4-cell | _ Dist : =a eee 
RNA-seq 8-cell i _ Ando ; : _ 
ICM = .. 7 os & 2 —— ee me ete 
mESC a 
Repeat MM MaLR Ll | Tt ha 
Ubtfl1 Veen a ERVL 
— B1 
Promoter peaks Distal peaks 
e Percentage of stage-specific gene with TSSs f Types of repeats overlapping with 
directly overlapping with repeats TSSs of stage-specific genes 
®™ Unknown 
m Stage specific TSS overlap with repeats 100% = Satellite 
12% 
10% m= Random TSS overlap with repeats 80% =rRNA 
© () oO . 
S 8% & 60% = Simple repeat 
c 4 c BSINE 
eS eA 8 40% 
GB 4% 2 = Low complexity 
jae 9 
2% ae BLTR 
0% 0% = LINE 
2-cell  4-cell 8-cell ICM mESC 2-cell 4-cell 8-cell ICM = mESC 
g Enrichment of repeats in H3K27ac peaks h Density of repeats 
0.6 I I 
| | 4 
Promoter peaks > 04 | | 
Distal peaks G 
S OP WK wot a | Ba 
IN * 
PS ES RAZ soy ye 0.2 
V COLO Sood 3.0 ME 3.0 | MERVL 
Q } rn F 0 
Y Log ratio of observed/random 
-10 TSS TES 10(kb) 
Extended Data Figure 6 | Widespread presence of repetitive elementsin — d, The snapshots of promoter (left) or distal (middle and right) ATAC-seq 
accessible chromatin in early mouse embryos. a, Bar charts showing the peaks near various types of repeats. e, Percentages of stage-specific genes 
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repeats, or with at least 20% of the peak regions overlapping with repeats. 
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(right). c, Box plots showing the expression levels of all chromatin terms are also shown (right). 


regulator genes (as identified by the GREAT tool) near stage-specific 
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Extended Data Figure 8 | Analysis of single-cell RNA-seq and promoter peaks was calculated across all developmental stages. Distal- 


identification of putative promoter—enhancer pairs in early embryos. a, 
A heat map shows the differential expression levels between ICM and TE 
for all genes ranked by their ICM:TE expression ratio (left). Genes that with 
low expression in both ICM and TE (FPKM <1) were removed. Selected 


ICM and TE regulators/markers are shown with their differential expression 


rank indicated in parentheses (middle). Positive and negative ranks indicate 
the ranking in ICM-specific and TE-specific genes, respectively. Examples 
of ICM-specific genes are shown on the right, including those exclusively 
expressed in ICM (right top) or known ICM regulators (right bottom). b, 


Schematic shows the method to identify target genes of a TE. The correlation 


of ATAC-seq enrichment of distal peaks and their nearby (within 100 kb) 


promoter peak pairs showing high correlation are considered as putative 
enhancer-promoter pairs (see Supplementary Information). Candidate 
regulatory TFs were identified by searching motifs in distal peaks, and 
distal-peak linked promoters were defined as the potential targets of TFs. 
c, Heat maps showing the ATAC-seq enrichment at stage-specific distal 
ATAC-seq peaks, their predicted target promoters, and the expression of 
corresponding genes. Only genes that are not expressed in oocytes (ZGA- 
only) were included in this analysis. d, Box plots showing the correlation 
between enrichment levels of distal ATAC-seq peaks and expression of their 
predicted target genes, compared to random distal peak-promoter peak 
pairs. A P value based on the f-test is shown. 
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Extended Data Figure 9 | NR5A2 regulates expression of key TFs for 
ICM and TE at the 8-cell stage. a, Bar charts showing the expression levels 
of Nr5a2 in preimplantation embryos using data from this study or 

Deng et al.'”. The error bars denote the standard deviation of FPKM values 
across two RNA-seq replicates. b, Heat maps show the correlation between 
the expression of Nr5a2 and all genes across individual single cells from 


p-value=1.5E-7 


the 8-cell embryos (left) or blastocysts (right) using data from Deng 

et al.'*. Scatter plots comparing expression of Nr5a2 with Pou5f1 or Cdx2 
across individual cells from the 8-cell embryos or blastocysts. c, Schematic 
of the Nr5a2 knockdown experiments. A Venn diagram shows the overlap 
between Nr5a2-knockdown downregulated genes and 8-cell-specific 
genes. A P value based on the hypergeometric distribution is shown. 
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Extended Data Figure 10 | Transcription and chromatin states in minor 
ZGA. a, A snapshot of the UCSC genome browser shows an example of 
early 2-cell gene family (Zscan4) that reside in clusters in the genome. 

b, Box plots show the expression levels of ZGA-only genes that are 
activated by the 2-cell stage (activated either at zygote, early 2-cell or 
2-cell stages). c, Pie chart shows the percentages of early 2-cell genes that 
reside in clusters or are solitary. d, Bar chart shows the numbers of ATAC- 
seq peaks identified at each stage. Two programs were used (MACS and 
HOMER”) to verify the ATAC-seq peak analyses in early 2-cell stages. 

e, Bar chart shows the genome coverages by ATAC-seq peaks at different 
stages called either by MACS or HOMER. f, The enrichment (log ratio 

of observed to random) of repeat subfamily in ATAC-seq promoter and 


distal peaks in the early 2-cell embryos. g, Bar chart shows the expression 
levels of MERVL in oocyte and preimplantation embryos. h, The average 
ATAC-seq enrichment in the early 2-cell embryos is shown for regions 
near MERVLs with high, median and low levels of expression. The 
MERVL region is not shown due to the low mappability. i, Bar chart 
shows the numbers of early 2-cell genes that fall into the broad domains 
of open chromatin. A similar analysis for a set of random domains 

(of equal length as each corresponding broad domain) was performed 

as a control. The P value was calculated based on the hypergeometric 
distribution. j, Bar chart shows the expression levels of Zfp352 in oocyte 
and preimplantation embryos. 
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A Neptune-sized transiting planet closely orbiting a 
5-10-million-year-old star 
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David R. Ciardi®, Andrew W. Howard’, Howard T. Isaacson®, Ann Marie Cody’, Joshua E. Schlieder?, Charles A. Beichman® & 


Scott A. Barenfeld! 


Theories of the formation and early evolution of planetary systems 
postulate that planets are born in circumstellar disks, and undergo 
radial migration during and after dissipation of the dust and gas disk 
from which they formed’. The precise ages of meteorites indicate 
that planetesimals—the building blocks of planets—are produced 
within the first million years of a star’s life*. Fully formed planets are 
frequently detected on short orbital periods around mature stars. 
Some theories suggest that the in situ formation of planets close to 
their host stars is unlikely and that the existence of such planets is 
therefore evidence of large-scale migration*». Other theories posit 
that planet assembly at small orbital separations may be common®*. 
Here we report a newly born, transiting planet orbiting its star with 
a period of 5.4 days. The planet is 50 per cent larger than Neptune, 
and its mass is less than 3.6 times that of Jupiter (at 99.7 per cent 
confidence), with a true mass likely to be similar to that of Neptune. 
The star is 5-10 million years old and has a tenuous dust disk 
extending outward from about twice the Earth-Sun separation, in 
addition to the fully formed planet located at less than one-twentieth 
of the Earth-Sun separation. 

The star [PGZ2001] J161014.7—191909, hereafter K2-33, is an 
M-type star several million years (Myr) old that was observed by 
NASAs Kepler Space Telescope during campaign 2 of the K2 mission. 
The star was identified as one of more than 200 candidate planet hosts 
in a systematic search for transits in K2 data’. As part of our ongoing 
study of the pre-main-sequence population of Upper Scorpius observed 
by K2, we independently verified and analysed the planetary transit 
signal. We acquired radial velocity and high spatial resolution obser- 
vations at the W. M. Keck Observatory to confirm the detection of the 
planet, named K2-33 b, and to measure its size and mass. 

Within the 77.5-day photometric time series of K2-33 (Kp = 
14.3 mag), there are periodic dimmings of 0.23% lasting 4.2 h and 
occurring every 5.4 days (Fig. 1). The ensemble of transits are detected 
at a combined signal-to-noise ratio of about 32. During the K2 obser- 
vations, cool, dark regions on the stellar surface (starspots) rotated in 
and out of view, producing semi-sinusoidal brightness variations of 
~3% peak-to-trough amplitude with a periodicity of 6.3 + 0.2 days 
(Extended Data Fig. 1). We removed the starspot variability before 
modelling the transit events. We fitted the transit profiles using estab- 
lished methods’, measuring the planet's size relative to its host star and 
its orbital geometry (Table 1). 

K2-33 is a member of the Upper Scorpius OB association!"!’, the 
nearest site to Earth of recent massive star formation (at a distance 
of 145 + 20pc). Approximately 20% of low-mass stars in Upper 
Scorpius host protoplanetary disks’, indicating that planet formation 
is ongoing in the region but in an advanced stage or completed for 
the majority of stars. The age of the stellar association is 5-10 Myr, 


as assessed from kinematics, the Hertzsprung—Russell diagram, and 
eclipsing binary analyses. The youth of K2-33 itself is based on the 
spectroscopic indicators of enhanced hydrogen emission and lith- 
ium absorption'’!”, which we confirm from Keck spectra (Table 1). 
Furthermore, the stellar rotation rate we measure via broadening of 
absorption lines in the spectra and via the starspot period (Table 1), is 
rapid relative to field-age stars of similar mass'*. We determined the 
star’s systemic radial velocity (Table 1) to be consistent with the mean 
value for Upper Scorpius members’*. Previously, proper motions were 
used to assess the probability of membership in the association at 99.9% 
(ref. 16). Finally, the positions of the star in the Hertzsprung—Russell 
and temperature—density diagrams (Extended Data Fig. 2) are con- 
sistent with the sequence of low-mass members of Upper Scorpius’”. 

The inferred planetary size and mass depend directly upon the 
host star size and mass. We evaluated the effective temperature and 
luminosity from our newly determined spectral type (Table 1), extinction- 
corrected catalogue near-infrared photometry, and empirical 
pre-main-sequence calibrations!”!®. With the temperature and lumi- 
nosity, we derive a stellar radius from the Stefan—Boltzmann law of 
R,=(1.140.1)Ro, where Ro is the radius of the Sun. The radius 
uncertainty is calculated accounting for recommended errors in 
temperature!’, photometric errors, and assuming an association depth 
comparable to its width on the sky. Combining the stellar radius with 
the planet-to-star radius ratio determined from the K2 light-curve fit, 
we infer a planetary radius for the companion of Rp = (5.8 £0.6)Ro, 
where Ra is Earth’s radius, or about 50% larger than Neptune. 

We estimate a stellar mass of M,, = (0.31 £0.05) Mo, where Mo is the 
mass of the Sun, by interpolation among pre-main-sequence stellar 
evolution models’’, consistent with a previously reported value”’. The 
mass uncertainty assumes normal error distributions in temperature 
and luminosity. As there is no evidence for radial velocity variations 
among four high-dispersion Keck spectra (Extended Data Table 1), the 
planet mass is constrained from the maximum-amplitude Keplerian 
curve that is consistent within the errors with all radial velocity meas- 
urements (Extended Data Fig. 3). Given the transit ephemeris and 
assuming a circular, edge-on orbit, the expected stellar reflex veloc- 
ity is a sinusoid having a single free parameter: the semi-amplitude. 
Radial velocity semi-amplitudes of >900 ms~! are ruled out at 99.7% 
confidence, corresponding to a 30 upper limit on the mass of K2-33 b 
of 3.6Mjup, where Myjup is the mass of Jupiter. 

The true mass of K2-33 b is likely to be at least an order of magnitude 
smaller. There are seven known exoplanets of similar size (Rp =4.8R5- 
6.6Rz) with densities measured to 50% or better. These planets have 
masses ranging from (6.3 + 0.8)M« (for Kepler-87 c; where Ma is 
Earth’s mass) to (69 + 11)M« (for CoRoT-8 b) owing to varying core 
masses. Thus, plausible masses of K2-33 b range from about 6Mz to 
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Figure 1 | Light curve of K2-33. a—d, K2 photometry in twenty-day 
segments. K2-33 varies in brightness by ~3% every 6.3 days owing to the 
rotation of its spotted surface. The planet K2-33 b transits its star every 
5.4 days (red ticks). Another potential transit (blue tick in d), distinct 
from the K2-33 b transits, is possibly caused by a second planet with an 
orbital period of >77.5 days. e, Stellar variability was removed before 
fitting transits. The data gap is due to excluded observations where the 
variability fit inadequately captured a systematic artefact in the light 
curve. f, K2 photometry folded on the planet’s orbital period with a 
transit model fit (red curve). BJD, barycentric Julian date. 


70M», corresponding to radial velocity semi-amplitudes of 5-56 ms~! 
An even lower mass may be implied if the young planet is still under- 
going Kelvin-Helmholtz radial contraction. 

A semi-major axis of 0.04 au (~8R,) is measured for K2-33 b from 
the orbital period and Kepler's third law, adopting the value of M, in 
Table 1. The orbit is near the silicate dust sublimation radius, as well as 
the co-rotation radius, where some protoplanetary disk theories predict 
a magnetospheric cavity extending to the stellar surface”’. At this sep- 
aration, the blackbody equilibrium temperature of the planet is 850 K. 

We have interpreted the observed transit as a single planet orbiting 
K2-33. Other interpretations involving eclipsing stellar binaries resid- 
ing within the K2 aperture (Extended Data Fig. 4) would be diluted by 
the light of K2-33 and could potentially mimic the observed transit. 
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Table 1 | System properties of K2-33 
Parameter 


Value Uncertainty 


Stellar properties 
2MASS designation 
EPIC designation 
Right ascension (J2000.0) 
Declination (J2000.0) 


J16101473—1919095 
205117205 
16h 10min 14.738s 
—19° 19'09.55” 


Proper motion, RA, fia (mas yr~) -9.8 417 
Proper motion, dec., jus (mas yr~+) —24.2 +18 
Kepler magnitude, Kp (mag) 143 
Cluster distance, d (pc) 145 +20 
Kinematic distance, dyin (pc) 139 +11 
Spectral type M3 +05 
V-band extinction, Ay (mag) 129 
Luminosity, log(L./Lo) (dex) —0.83 +0.07 
Effective temperature, Tes (K) 3,410 +75 
Stellar radius, R, (Ro) iS 40.1 
Stellar mass, M, (Mo) 0.31 +0.05 
Mean stellar density, p.. (gcm~°) 0.34 +0.12 
Surface gravity, log(g..) (dex) 3.84 +0.16 
Stellar rotation period, Prot (days) 6.3 +0.2 
Systemic radial velocity, y (km s~!) —6.6 +0.1 
Projected rotational velocity, 5-9 
vsin(i) (km s~!) 
Ha equivalent width (A) -13 +0.1 
H8 equivalent width (A) 1.05 +0.05 
Lil 6,708A equivalent width (A) 0.60 +0.05 
Light-curve modelling parameters 
Orbital period (days) 5.42513 etree 
Time of mid-transit, to (BJD; days) -2,456,936.6665 Bor 
Transit duration, 7,4 (hours) 4.22 ee 
Planet-to-star radius ratio, Rp/R,. 0.0476 Bree 
Scaled semi-major axis, R./a 0.109 Le 
Impact parameter, b 0.49 a 
Inclination, i (deg) 86.9 ee 
Mean stellar density, p.,cire (gem) 0.49 eee 
Linear limb darkening coefficient, u 0.603 en 
Planet properties 

Planet radius, Rp (Re) 5.76 eee 
Planet mass, Mp (Mjup) <3.6 

at F +0.0021 
Semi-major axis, a (AU) 0.0409 250023 
Blackbody equilibrium 850 +50 


temperature, Teg (K) 


Right ascension and declination originate from 2MASS, proper motions from UCAC4, and Kepler 
magnitude from the Ecliptic Plane Input Catalog. The mean cluster distance®° is assumed, with 
uncertainty equal to the presumed cluster depth!*. Quoted transit parameters and uncertainties 
are the medians and 15.87%, 84.13% percentiles of the posterior distributions. 


Such a putative eclipsing stellar binary could be associated with (that is, 
gravitationally bound to) K2-33, or unassociated but aligned by chance. 
Given constraints from our suite of follow-up observations, we show 
that the chance of an eclipsing stellar binary producing the observed 
transit is vanishingly small. 

We first consider unassociated eclipsing stellar binaries, which we 
characterize by their sky-projected separation from K2-33 and their 
brightness relative to K2-33 in the Kepler bandpass, AKp. Eclipse 
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depths may not exceed 100%; thus eclipsing stellar binaries with 
AKp > 6.6 mag cannot account for the 0.23% observed transit depth. 
We do not detect closely projected companions in seeing-limited and 
multi-epoch adaptive optics images (Extended Data Fig. 5), nor in 
searches for secondary lines in high-resolution optical spectra. These 
observational constraints, shown in Fig. 2a, eliminate nearly all scenarios 
involving unassociated eclipsing stellar binaries. The probability of an 
eclipsing stellar binary lurking in the remaining parameter space is 
<4 x 107° (see Methods). 

We now consider associated eclipsing stellar binaries in terms of their 
physical distance to K2-33, d, and AKp. As in the case of unassociated 
eclipsing stellar binaries, our imaging and spectroscopic data eliminate 
the vast majority of associated eclipsing stellar binary configurations. 
Additionally, the lack of detectable line-of-sight acceleration over 
the baseline of the observations rules out associated eclipsing stellar 
binaries in a mass-dependent manner (see Methods). The constraints 
provided by these complementary observations are depicted in Fig. 2b. 
However, some scenarios having d= 1-3 au and AKp = 2-6 mag 
cannot be conclusively eliminated. Nearly all of these scenarios involve 
a planet orbiting either K2-33 or an undetected companion. If orbiting 
K2-33, the planet radius is at most 7.6% larger than the value reported 
in Table 1 (that is, within quoted uncertainties) owing to dilution from 
a companion. If orbiting a stellar or substellar companion to K2-33, 
the planet radius is at most ~1.8Ryyp, where Myjup is the mass of Jupiter. 
Only for AKp > 6 mag does the radius of the transiting object corre- 
spond to a mass of 213Mjup at 5-10 Myr. However, coeval eclipsing 
brown dwarfs probably would not produce eclipse depths >50%, and 
thus contrasts of AKp > 5.8 mag need not be considered. Furthermore, 
the low occurrence of brown dwarfs”* combined with the lack of 
observed secondary eclipses make such configurations extremely 
unlikely. Given the Kepler photometry and observational constraints, 
we quantitatively assessed the overall false-positive probability, using 
an established statistical framework”? (see Methods). We found that 
scenarios involving a single star and planet are 10!! times more likely 
than scenarios involving eclipsing stellar binaries. 

Spitzer Space Telescope observations of K2-33 revealed 24-\1m emis- 
sion in excess of the expected stellar photosphere by 50%, indicating 
the presence of cool circumstellar dust“. There is an absence, however, 
of warm dust close to the star, given the lack of similar infrared excess 
at wavelengths shorter than 16|1m (ref. 25). The spectral energy dis- 
tribution is best fitted by including a dust component at 122 K having 
an inner edge at 2.0 au. These data suggest that the inner regions of 
the previously present protoplanetary disk have cleared. Supporting 
this inference is the modest Ha signature (Table 1), which is consist- 
ent with chromospheric emission and indicates that the star is not 
accreting gas. 

At the age of K2-33, it is unclear whether the dust structure consists 
of debris resulting from the collisional grinding of planetesimals, or 
whether it is a remnant of the initial dust- and gas-rich disk. One pos- 
sibility is that the inner-disk regions have been cleared of dust by the 
gravitational influence of one or more planetary mass bodies”. Our 
detection of a short-period planet in a transitional disk lends support 
to this explanation. 

A flux upper limit at 880 }1m from the Atacama Large Millimeter 
Array combined with the measured Spitzer fluxes yields a constraint 
on the mass of dust remaining in the disk of less than 0.2Mg (ref. 27). 
Additionally, CO emission, a tracer of molecular hydrogen, was not 
detected”’, indicating that the primordial gas disk also has largely or 
entirely dissipated. 

The transiting planet around the young star K2-33 provides direct 
evidence that large planets can be found at small orbital separations 
shortly after dispersal of the nebular gas. Migration via tidal circular- 
ization of an eccentric planet, through, for example, the Kozai-Lidov 
mechanism, planet-planet scattering, or secular chaos, proceeds over 
timescales much greater than disk dispersal timescales, and thus can- 
not explain the planet’s current orbit. In situ formation, or formation 
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Figure 2 | Constraints on astrophysical false-positive scenarios. 

To confirm the planetary nature of K2-33 b, we considered and eliminated 
nearly all false-positive scenarios involving eclipsing stellar binaries. 

a, The domain of sky-projected separation and contrast of a putative 
unassociated eclipsing stellar binary, aligned with K2-33 by chance. 

The blue outlined region shows eclipsing stellar binaries eliminated 
using multi-epoch adaptive optics imaging, which leverages stellar 
proper motion to provide sensitivity at all separations within 5”. 

Green and purple regions represent constraints from optical spectra 

and seeing-limited imaging, respectively. Finally, eclipsing stellar binaries 
below the grey line cannot account for the observed transit depth and 

are eliminated. b, Limits on eclipsing stellar binaries associated with 

(that is, gravitationally bound to) K2-33. Constraints from imaging and 
spectroscopy are shown as a function of physical separation. The lack of 
detectable stellar acceleration provides an additional diagonal constraint 
at top left. Amag, difference in magnitude. 


at a larger separation followed by migration within the gas disk, are 
permitted scenarios given current observations. 

Interestingly, large planets are rarely found close to mature low-mass 
stars; fewer than 1% of M-dwarfs host Neptune-sized planets with 
orbital periods of <10 days”®, while ~20% host Earth-sized planets 
in the same period range”. This may be a hint that K2-33 b is still 
contracting, losing atmosphere, or undergoing radial migration. Future 
observations may test these hypotheses, and potentially reveal where 
in the protoplanetary disk the planet formed. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Stellar membership and properties. The partial kinematics of K2-33 (Table 1) 
can be combined with the galactic velocity of the Upper Scorpius subgroup to 
estimate distance and predict radial velocity. We calculated these parameters 
following established methods*’, adopting the mean UVW galactic velocities of 
the subgroup*, and estimated uncertainties using Monte Carlo sampling. We 
find dyin = 139 + 11 pc, consistent with the mean subgroup distance from Earth 
145 +2 pc (ref. 30), and radial velocity —7.3+0.5kms |, within 2c of the sys- 
temic radial velocity we measure from multiple Keck/HIRES spectra (Extended 
Data Table 1). 

We determined spectral type from the high-dispersion spectra and adopted an 
empirical spectral type to temperature conversion calibrated for young stars!”!8 
to estimate effective temperature (Table 1). From 2MASS photometry and the 
appropriate intrinsic J— H colour for the spectral type!®, we calculated the J— H 
colour excess, E(J — H) =0.10 mag. Assuming an extinction law we found visual 
and J-band extinctions of Ay = 1.29 mag and Aj= 0.30 mag, respectively. After cor- 
recting for extinction, we used the appropriate empirical J-band bolometric correc- 
tion for the spectral type!”"* and a distribution of distances to calculate luminosity. 
While the mean association distance is known precisely, its large sky-projected 
area (about 150 deg”) suggests that the association depth is substantial. Results 
from a secular parallax study indicate an association distance spread of <50 pc, 
with consideration of the angular diameter corresponding to a 35-pc spread! 
at the mean distance. In calculating luminosity, we conservatively considered a 
uniform distribution of distances in a cubic volume 40 pc on each side, centred 
on the mean association distance of 145 pc (ref. 30). Uncertainty is calculated 
from Monte Carlo sampling, accounting for photometric errors, recommended 
errors in temperature!’, and distance uncertainties. We propagated luminosity 
and temperature errors through in calculating the radius uncertainties using the 
Stefan-Boltzmann law. 

Detailed modelling of the transit profile provides a constraint on mean stellar 

density, assuming a circular orbit*4. We found a mean stellar density of 
Pxcire= 0.49 gcm °, consistent with the value implied by our adopted M,,. and R,. 
(Table 1). From the posterior distribution of mean stellar densities and a normal 
distribution in effective temperature, we interpolated between pre-main-sequence 
models!’ to determine a stellar mass and age of M..= (0.30 + 0.04)Mo and 
t= 54 Myr, respectively (Extended Data Fig. 2). However, we conservatively adopt 
the ensemble age of the association, as we consider it more robust than the age of 
an individual star. 
Stellar rotation and independent assessment of the stellar radius. We measured 
the stellar rotation period as Pyo¢= 6.3 + 0.2 days from a Lomb-Scargle periodo- 
gram*>*° of the light curve. Uncertainty was determined from the half-width at 
half-maximum of a Gaussian fit to the periodogram peak. Extended Data Fig. 1 
depicts the light curve folded on the rotational period. 

The stellar rotation speed as projected along the line of sight, vsini,, was esti- 
mated from the spectra by artificially broadening an absorption spectrum of a 
slowly rotating stellar template of similar temperature to K2-33, acquired using 
the same spectrograph and set-up. The range of plausible rotational velocities is 
constrained through minimization of the residuals between the broadened tem- 
plate and the observed spectrum. We find a most likely projected rotation velocity 
vsini, ~ 5-9kms~!. Combined with the rotation period measured from the light 
curve, we used the projected rotational velocity to determine an independent 
estimate of the stellar radius modulated by the sine of the stellar inclination of 
R, sini, = vsini,- Prot/27 = (0.85 + 0.25) Ro, where we quote a 95% confidence inter- 
val assuming a uniform distribution in vsini.. 

Our value for R,sini, is consistent with the Stefan—Boltzmann radius within 

20. Two effects could bias R..sini,, away from the true value of R.: (1) the surface 
features dominating the rotational modulation of the light curve may be confined 
to a range of stellar latitudes that may not reflect the same velocity field encoded in 
the rotational broadening of spectral lines, and (2) the star may have an inclination 
resulting in the value of sini, being substantially <1. If this is the case, the orbit 
of K2-33 b is misaligned with the spin of its host star. While R.sini, provides a 
valuable consistency check on stellar radius, we use the Stefan—Boltzmann radius 
given its insensitivity to the unknown stellar inclination. 
K2 time series photometry treatment. The K2 mission*’ observes fields along 
the ecliptic plane for approximately 75 days at a time. K2 photometry possesses 
percentage-level systematic signatures from pointing drift and intra-pixel detector 
sensitivity variations that must be corrected in order to detect sub-per-cent planet 
transits. We acquired such a corrected light curve from the Exoplanet Follow-up 
Observing Program public website’, derived using a 12” x 16” rectangular aperture 
(Extended Data Fig. 4). 

Before modelling the transit profile, we removed the spot modulation pattern 
using a cubic basis spline fit with knots spaced by 12 long-cadence measurements, 
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employing iterative rejection of outliers**. We verified that no in-transit observa- 
tions were included in the spline fit by phasing the flattened light curve on the 
orbital period and inspecting the points excluded from the fit. An artificial gap in 
the systematics-corrected light curve from the ExoFOP page was not adequately 
captured by the spline fit and, consequently, we excluded from further analysis 
those data with BJD values in the range 2,456,936 to 2,456,938, resulting in the 
loss of a single transit. We assigned a constant observational uncertainty for each 
K2 measurement, determined from the standard deviation in the out-of-transit 
light curve (here defined as observations more than 12 h from either side of the 
transit centres). 

Transit model-fitting analysis. We employed previously established methodol- 
ogy”” for fitting transit models to the light curve. The approach uses the BATMAN 
software (https://pypi-python.org/pypi/batman-package/2.1.0), based on the 
Mandel and Agol analytic light-curve formalism*®, to generate model transit pro- 
files. Transit models were numerically integrated to match the ~30-min cadence 
of K2 observations. We assumed a linear limb-darkening law for the star, imposing 
a Gaussian prior on the linear limb-darkening coefficient, u, based on tabulated 
values"! appropriate for the effective temperature and surface gravity of K2-33. 
We also allowed for dilution by light from a second star in the fitting. In post-pro- 
cessing, we selected only those samples corresponding to dilution levels consistent 
with our companion exclusion analysis (Fig. 2). 

The directly fitted transit parameters are the scaled semi-major axis (a/R,,.), the 
planet-to-star radius ratio (R,/R.), the orbital period (P), the time of mid-transit 
(to) and the inclination (i). The multi-dimensional transit parameter space was 
explored using an affine invariant implementation of the Markov chain Monte 
Carlo algorithm” to find the best-fit model and determine parameter uncertain- 
ties. Each observation was weighted equally, resulting in a best-fit likelihood of 
—1/2 x x. We followed established methods!° for Markov chain Monte Carlo 
initialization, burn-in treatment, rescaling of data weights, and convergence test- 
ing. Table 1 quotes median transit model values, with uncertainties determined 
from the 15.87% and 84.13% percentiles of the parameter posterior distributions. 

The scatter in transit is ~15% larger than the scatter in equal duration intervals 
before and after transit. One possible explanation may be spot-crossing events, 
given that the star is expected to have a high spot-covering fraction’, supported 
by the modulation in the unflattened light curve. 

Prior to adopting the publicly available light curve’, we independently extracted 
alight curve from K2 target pixel files using a different photometric pipeline!”™. 
Performing the same modelling described above on this second light curve 
produced consistent results. 

High-spectral-resolution observations and radial velocities. We used the High 
Resolution Spectrograph (HIRES) on the 10-m Keck-1 telescope to measure the 
radial velocity of K2-33 relative to the Solar System barycentre (Extended Data 
Table 1) to confirm its cluster membership, and to constrain the mass of K2-33 b. 
The resolution of these spectra are R~ 50,000 in the range ~3,600-8,000 A for 
the 2016 epochs and R ~ 36,000 in the range ~4,800-9,200 A for the 2015 epoch. 

For the first epoch, velocity was derived by cross-correlating the spectrum with 
radial velocity standards*® observed using HIRES in the same spectrograph con- 
figuration. Uncertainty is quantified from the dispersion among measurements 
relative to different standards, and over many different spectral orders. For the 
three epochs in 2016, systemic radial velocity was measured using the telluric A and 
B absorption bands as a fiducial wavelength reference. Assessing all measurements, 
the star’s systemic radial velocity is —6.6 + 0.1kms~!, where we quote a weighted 
average and standard deviation. 

Limits on companions from the spectroscopic data. We searched for and 
excluded distant gravitationally bound companions to K2-33 by looking for trends 
in the 265-day radial velocity time series. A 3c upper limit to any possible acceler- 
ation is 2.6kms"! yr~1. Following established methods”, the limiting minimum 
mass detectable from the radial velocity measurements as a function of orbital sep- 
aration rules out additional companions of >0.14Mo at 3 au and >0.39Mg at 5 av. 

We also searched for secondary spectral lines“ that would arise from a compan- 

ion projected within 0.4” of the primary. No stars as faint as 3% of the brightness 
of the primary were detected, though we are blind to companion stars with a small 
(<15 km s~!) radial velocity relative to the primary because the spectral lines of 
the two stars would not be distinguished. 
High-resolution imaging. Using adaptive optics at the Keck-2 telescope on uTC 
2011 May 15, we obtained ten 9-s exposures of K2-33. A second set of Keck/NIRC2 
images was acquired on utc 2016 February 17 ina three-point dither pattern with 
10-s integrations per dither position, repeated three times. For both epochs, the 
narrow-camera optics were used resulting in a pixel scale of 9.942 mas per pixel; 
the final co-added images have a resolution of 0.07” (full-width at half-maximum). 
The observations at the two epochs have the same total integration time, so the 
final images have comparable depth. 
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To estimate sensitivity to point sources as a function of radial distance from the 

star, median flux levels and root-mean-square dispersions (c) were calculated in 
incremental annuli centred on the source. An image was constructed with these 
characteristics and synthetic sources with full-width at half-maximum equivalent 
to that of K2-33 were injected into the image at varying positions and brightnesses. 
The synthetic sources were then measured to determine the 5c detection limit. 
Comparing the instrumental magnitude to that of the star produces AK for that 
annulus. 
Non-redundant aperture masking. K2-33 was observed using an aperture mask 
on the same 2011 night as the clear aperture images were taken. Aperture mask- 
ing interferometry”” uses an opaque mask with clear holes such that the base- 
line between any two samples a unique spatial frequency in the pupil plane, and 
achieves angular resolution as good as 1/3 x \/D compared to 1.2 x \/D, though 
at the expense of diminished throughput***!. 

With the nine-hole mask in the NIRC2 camera, we obtained 40 dithered images 

using 20-s exposures. Observations of calibrator stars were interspersed with the 
targets and obtained in an identical manner. In each image, the mask creates a set 
of 36 overlapping interference fringes on the detector. The bispectrum, the complex 
triple product of visibilities defined by the three baselines formed from any three 
subapertures, is then calculated. The phase of this complex quantity is the closure 
phase, which has the advantage of being largely insensitive to phase delays owing 
to atmospheric effects or residual uncorrected phase aberrations not sensed by the 
adaptive optics system. We followed established procedures for calculating the 
closure phase for calibrator stars. 
Limits on companions from the imaging data. Using all three sets of high- 
spatial-resolution near-infrared imaging data, we searched for projected compan- 
ions to K2-33 (either gravitationally bound or foreground/background sources). 
The two epochs of clear aperture data each achieved AK > 4.5 mag of contrast 
beyond 0.13” and AK >7.5 mag beyond 1”. In the interim between the observa- 
tions, K2-33 moved on the sky by 0.1228” + 0.0085” owing to its parallactic and 
proper motion. The combined set of infrared images thus rules out nearly all unas- 
sociated background stars bright enough to produce the transit observed in the 
optical. The aperture masking observations are sensitive to more closely projected 
(from 0.02”-0.16”) companions in the stellar and substellar mass regimes. We use 
these to rule out potential associated companions larger than 19 Jupiter masses 
down to orbital separations of 3 Au, and are sensitive to companions with masses 
as low as 11-12 Jupiter masses in the range 6-23 au, where quoted companion 
masses are model-dependent" conservatively assuming an age of 10 Myr. Finally, 
a prior analysis™ of near-infrared photometry ruled out associated companions 
down to masses of 5%-6% of the primary mass at separations of 1.7”-27.5", or 
approximately 250-4,000 au. 

The imaging limits constrain the brightnesses of any putative companions 
in the near-infrared K-band. To approximately convert these contrast limits to 
constraints in the optical Kepler bandpass, Kp, we employed a combination of 
theoretical evolutionary models and empirical colour-colour relations. The 
TRILEGAL simulation discussed below predicts that the mean potential contami- 
nant towards K2-33 would have infrared colour J — K;=0.57 mag, corresponding to 
a K-type dwarf. From an empirical optical-infrared Kp-to-K, conversion*°, we con- 
clude that a typical contaminating source would have colour Kp — K;= 2.00 mag. 
We thus gain an additional 2 magnitudes of contrast when converting the NIRC2 
contrast curve to the corresponding limits in the Kepler bandpass when consider- 
ing unassociated companions. 

For associated companions, the imaging constraints natively derived in the 
near-infrared were converted to optical limits using pre-main-sequence evolution- 
ary models’. Putative companion masses can be paired with our assumed primary 
mass to yield predicted R-band contrasts at 5-10 Myr, where the R-band serves 
as a proxy for the Kepler bandpass. Notably, for the clear aperture adaptive optics 
region of Fig. 2b, the contrast achieved beyond 30 au is better than represented in 
the figure owing to limitations of the models, which do not extend below 10Mjup 
(corresponding to AKp > 9.5 mag). 

We rule out nearly all stellar and brown-dwarf mass companions to K2-33, 
with the exception of a narrow swath of parameter space corresponding to ultra- 
low-mass stars or brown dwarfs separated by 1-3 au from the primary (Fig. 2). 
The physically permitted (as opposed to unexcluded) separation range of any 
hypothetical diluting companion is even smaller, considering that the inferred 
inner edge of the disk is at 2 au. 

Galactic structure model and intracluster contamination. In addition to search- 
ing directly for sources that could contaminate the K2 light curve, we estimated 
the expected surface density of such sources as a function of magnitude using the 
TRILEGAL version 1.6 model**”” (http://stev.oapd.inaf.it/cgi-bin/trilegal) of the 
Milky Way Galaxy. Notably, TRILEGAL does not include the local extinction due 
to gas and dust associated with Upper Scorpius itself, and therefore produces an 
upper limit to the field-star source density. We simulated a 1-deg’ field and scaled 


the resulting numbers first to the 10 x 10 arcsec’ field of view of the Keck/NIRC2 
images and the 12 x 16 arcsec” K2 photometric aperture, and then to the unex- 
cluded regions of parameter space in Fig. 2a. We found the TRILEGAL prediction 
for the expected surface density of sources to the K, < 18.0 mag limit (50) of the 
NIRC2 data, and translated it to <0.15 sources per NIRC2 field, consistent with 
our detection of none. Within the K2 photometry aperture, <0.3 unassociated 
sources are expected to the same magnitude limit, corresponding to a mean optical 
brightness of Kp = 18.6 mag. 

By considering the surface density of sources expected from integrating essen- 
tially all the way through the galaxy, a maximum of three sources are expected in 
the K2 aperture, having mean optical brightness Kp = 23.1 mag, which is too faint 
to explain the transit depth. Indeed, we have ruled out nearly all companions with 
projected separations larger than 0.04”, as well as effectively all of those inside 0.04”; 
fewer than 4 x 10~° sources are expected in the remaining unexcluded parameter 
space of Fig. 2a. An even smaller number of sources are expected to be eclipsing 
stellar binaries. Therefore, in addition to not detecting any contaminants in the 
high-spatial-resolution imaging and spectroscopic data, we argue that essentially 
none are expected. 

A similar source-density argument can be used to constrain the probability of 

contaminants with nearly identical proper motions to K2-33, probably association 
members that are foreground or background to K2-33. The multi-epoch adaptive 
optics imaging cannot rule out closely projected sources within 0.02” (the inner 
limit probed by the aperture masking observations) that are also co-moving with 
K2-33 between 2011 and 2016. From the observed Upper Scorpius mass function 
and width of the association on the sky’, we estimated a source density of about 
16 members deg”. Thus, fewer than 2 x 10~° co-moving contaminants are 
expected within 0.02” of K2-33. 
False-positive probability analysis. Eclipsing stellar binaries, when diluted by 
the light of a third star, can produce light curves that masquerade as a planetary 
transit. These false positives come in three broad classes: (1) undiluted eclipsing 
stellar binaries, (2) background (and foreground) eclipsing stellar binaries where 
the eclipses are diluted by the target star, and (3) bound eclipsing stellar binaries 
in hierarchical triple systems. 

We used the VESPA program’? (https://pypi.python.org/pypi/VESPA/0.4.7) to 

compare the likelihood of each binary scenario against the planetary interpretation. 
As input for the calculation, we provide the K2 light curve, the stellar parameters, 
and to be as conservative as possible, we adopt our least stringent imaging con- 
straints: the 2011 NIRC2 clear contrast curve, and the aperture masking limits. 
Even in this minimally constraining scenario, we find a false-positive probability of 
<1x 107! from VESPA, as expected given our exclusion in Fig. 2 of essentially all 
the modelled scenarios. Notably, however, VESPA does not account for substellar 
objects, pre-main-sequence evolution, extinction, or the unknown prior probability 
of planets around 5-10-Myr-old stars. 
Implications of hierarchical triple scenarios. The conjectured hierarchical triple 
configuration is argued elsewhere as unlikely on the basis of population statistics; 
however, we must still consider the possibility that K2-33 b has a larger radius 
owing to dilution of the transit depth by a luminous companion to K2-33. We first 
consider the case in which the planet orbits K2-33, but the transit depth is diluted 
by an undetected secondary. In this scenario, the ratio of the true planet radius to 
the observed planet radius (that is, not accounting for dilution) is 
Rp,true/Re,obs = {1 + Feec/Fprir Where Fyec/Fpxi is the optical flux ratio between the 
secondary and primary. For AKp in the range 2-6 mag, the true planet radius is at 
most 7.6% larger, within our quoted uncertainties. 

Now, we consider the case in which the planet orbits an undetected companion 
to K2-33. In this scenario, the planet radius is Rp = Rec, 50(1 + Fori/Feec)» where 
Rgec is the secondary radius, and 6p is the observed transit depth. Using AR as a 
proxy for AKp, we found the secondary radius implied by the optical brightness 
decrement using evolutionary models”” valid for 5-10 Myr. For AKp in the range 
2-6 mag, we found that the implied planet radius is in the range 
(0.56-1.85)Rjup. At these ages, such radii correspond to planet masses of <13Mjup. 
Only for AKp > 6 mag does the implied mass exceed the nominal brown-dwarf 
minimum mass. However, coeval eclipsing brown dwarfs are unlikely to produce 
eclipse depths of >50% (corresponding to contrasts of AKp > 5.8 mag). 
Furthermore, we argue that scenarios involving eclipsing brown dwarfs 1-3 au 
from K2-33 are extremely unlikely, for several compounding reasons: (1) the 
observed frequency of brown-dwarf companions to M-dwarfs is a few per cent??*; 
(2) in the restricted domain of a= 1-3 au, the frequency is lower still; and (3) the 
frequency of eclipsing brown dwarf pairs so similar in size and temperature that 
the primary and secondary eclipses are indistinguishable is smaller still. Finally, 
the tenuous dust disk with inner edge at 2 au is further evidence in favour of the 
single-star scenario. 

Cluster age. The age of Upper Scorpius is constrained to be 5-10 Myr, from a 
variety of considerations. Absence of dense molecular gas or protostars in Upper 
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Scorpius implies that star formation has ceased in the region®”, while the presence 
of protoplanetary disks around a significant fraction of members indicates that 
planet formation is ongoing!*. However, the precise age of the association is cur- 
rently debated. An early kinematic analysis, in which the motions of high-mass 
members were traced back in time to the point of closest proximity to one another, 
suggested an age of 5 Myr (ref. 60). The first Hertzsprung-Russell diagram analysis 
of the full stellar population, spanning from the highest to the lowest masses, also 
determined an age of 5 Myr without appreciable dispersion’*. Most subsequent age 
determinations using theoretical evolutionary models in the Hertzsprung—Russell 
diagram arrived at the same consistent age of 5-6 Myr for massive main-sequence 
turnoff stars°h, low-mass stars!”°-®, as well as substellar mass objects®. 

However, the association age has also been determined to be closer to 10 Myr 

from analyses of low-mass members°”® and the intermediate-mass pre-main- 
sequence and main-sequence population, main-sequence turn-off stars, and the 
supergiant Antares”. Emerging evidence from double-lined eclipsing binaries 
also supports an age in the range of 7-11 Myr**””1, as does an updated kinematic 
analysis’. Despite the lack of consensus on the precise age of Upper Scorpius, 
the full error-inclusive range of estimates in the literature (3-13 Myr) place the 
association at a critically important stage in the planet formation process—when 
most primordial disks have dispersed”. 
K2-33 b in the context of other claimed young planets. While several secure 
short-period planets have been found in orbit around stars in benchmark open 
clusters including the 600-800-Myr-old Hyades”*”* and Praesepe’”®, the evidence 
for planets at younger ages is mixed. Direct imaging has revealed (5-10) Myup ‘plan- 
etary mass companions located at large semi-major axes from several stars having 
ages of a few tens to a few hundreds of million years. Additionally, there are strong 
indications of ongoing planet formation in many 1-3-Myr-old circumstellar disks, 
based on the observed radial structure of dust. 

However, there are no confirmed planets in the Jupiter or sub-Jupiter mass range 

with ages less than those corresponding to the late-heavy-bombardment era in our 
own Solar System. Both TW Hya and PTFO 8—8695 have been claimed to host 
hot Jupiter candidates detected via radial velocity or transit methods, but neither 
object has stood up to scrutiny’”-””. K2-33 b at an age of 5-10 Myr, in contrast, is a 
secure transiting planet. It is slightly larger than Neptune and its mass is probably 
similar to Neptune’s mass. 
Code availability. We have opted not to make available our custom codes 
employed in the various elements of our data reduction and analysis, as they are 
either not of general purpose use, lack sufficient documentation, or they are already 
publicly available. The results of the computer processing are provided as Source 
Data. The raw data are publicly available, either currently (K2) or after a limited 
proprietary period (Keck, ALMA). 
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Extended Data Figure 1 | K2 light curve for K2-33 phased on the stellar The shape and evolution of the variability pattern depends on the number, 


rotation period of 6.3 days. Semi-sinusoidal brightness variations due to geometry, distribution, and lifetime of spots, along with any latitudinal 
rotational modulation of starspots. Point colour indicates the relative time gradient in the rotational speed (differential rotation). The transits of 
of observation, with grey corresponding to earlier in the campaign and K2-33 b are visible by eye in this figure and are too narrow in rotational 


dark blue corresponding to later times. Brightness is lowest when the most _ phase to be attributed to any feature on or near the stellar surface. 
heavily spotted hemisphere of the stellar surface is along the line of sight. 
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Extended Data Figure 2 | Model-dependent age of K2-33. a, Solid 
lines show mean stellar density as a function of effective temperature for 
pre-main-sequence stars having different ages, according to theoretical 
models'’. Grey points represent plausible combinations of density and 
temperature for K2-33 as determined by light-curve fits and stellar 


LETTER 


b 
0.25 


0.20 


oO 
— 
ol 


Probability density 
° 
S 


0.05 


0.00 
0 2 4 6 « 0 2 a4 


Stellar age (Myr) 


spectroscopy. b, Distribution of implied stellar age based on temperature, 
density, and pre-main-sequence models. The implied age of 2-7 Myr is 
consistent with the age we adopted of 5-10 Myr, derived independently. 
Dark- and light-grey shaded regions indicate 68% and 95% confidence 
intervals, respectively. 
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Extended Data Figure 3 | Apparent radial velocity variations of K2-33. variations larger than 300 ms! at 68.3% confidence, corresponding to a 
Line-of-sight velocities and 1o uncertainties (standard deviations, 1.2Mjup planet mass. Curves show the expected radial velocity variations 
indicated by error bars) with respect to the Solar System barycentre from for planets having circular orbits and different masses Mp. Radial velocities 
Keck/HIRES are indicated. Radial velocities are mean-subtracted, and the due to a 1.0Mjup planet (blue) are consistent with our observations, while a 


abscissa shows the orbital phase of K2-33 b measured from K2 photometry — 4.0Myup planet (red) is ruled out at high confidence. 
(mid-transit occurs at zero orbital phase). We rule out radial velocity 
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Extended Data Figure 4 | Images of K2-33. a, K2 target pixel file. b, Sloan SDSS reside within the K2 photometric aperture, one of which is a galaxy. 
Digital Sky Survey (SDSS) optical image. c, Keck/NIRC2 K-band image. All are 7.3-10.1 magnitudes fainter than K2-33 in the SDSS r-filter and 
Extents of the K2 target pixel file, K2 photometric aperture, and NIRC2 below the detection limit of the NIRC2 images, and are thus too faint to 


image are shown respectively with black, green, and purple boundaries. In _ produce the observed transits. 
each image, north is up and east is left. Three other sources identified by 
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Extended Data Figure 5 | Sensitivity to non-comoving sources in the set included non-redundant aperture masking, and provided tighter 
vicinity of K2-33. The blue X marks the star’s position in 2011. Between constraints. The combined sensitivity to non-comoving objects is the 
2011 and 2016, the star moved by 0.1228” + 0.0085” (red X) owing to maximum contrast achieved for either data set. Owing to stellar proper 
proper motion. Contours show the K-band sensitivity to non-comoving motion, we achieved K-band contrasts of >3.3 mag throughout the 
stars from adaptive optics imaging from both epochs. The 2011 data ARA-Adec. plane, even at the 2011 and 2016 positions of K2-33. 
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Extended Data Table 1 | Keck/HIRES radial velocities for K2-33 


Epoch (UTC) BD Radial velocity (km s”) 
2015-06-01 12:57:36.0 2457175.040 -6.61 + 0.58 
2016-02-02 15:30:14.4 2457421.146 -6.66 + 0.30 


2016-02-04 15:23:02.4 2457423.141 -6.60 + 0.30 
2016-02-21 14:29:45.6 2457440.104 -6.40 + 0.30 


Line-of-sight velocities and 1o uncertainties (standard deviations) with respect to the Solar System barycentre. UTC, Coordinated Universal time. 
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A hot Jupiter orbiting a 2-million-year-old 


solar-mass T Tauri star 


J. F Donati!?, C. Moutou?, L. Malo®, C. Baruteau!?, L. Yub?, E. Hébrard‘, G. Hussain®, S. Alencar®, F. Ménard”, J. Bouvier”®, 


P. Petit!?, M. Takami’, R. Doyon!? & A. Collier Cameron!! 


Hot Jupiters are giant Jupiter-like exoplanets that orbit their host 
stars 100 times more closely than Jupiter orbits the Sun. These 
planets presumably form in the outer part of the primordial disk 
from which both the central star and surrounding planets are born, 
then migrate inwards and yet avoid falling into their host star’. It 
is, however, unclear whether this occurs early in the lives of hot 
Jupiters, when they are still embedded within protoplanetary disks”, 
or later, once multiple planets are formed and interact>. Although 
numerous hot Jupiters have been detected around mature Sun-like 
stars, their existence has not yet been firmly demonstrated for young 
stars*°, whose magnetic activity is so intense that it overshadows the 
radial velocity signal that close-in giant planets can induce. Here we 
report that the radial velocities of the young star V830 Tau exhibit 
a sine wave of period 4.93 days and semi-amplitude 75 metres per 
second, detected with a false-alarm probability of less than 0.03 per 
cent, after filtering out the magnetic activity plaguing the spectra. 
We find that this signal is unrelated to the 2.741-day rotation period 


0.4 
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Figure 1 | Brightness map of V830 Tau and fit to the LSD profiles. 
a, Logarithmic brightness at the surface of V830 Tau as derived with 
Doppler imaging. Cool spots show as brown features and bright plages 
show as blue features. The rotation axis of the star is tilted at 55° to the 
line of sight, and the projected equatorial rotation velocity is equal to 
30.5km s"! (ref. 12). The star is shown in a flattened polar view, with 
the pole in the centre and the equator depicted as a bold circle. Ticks 
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of V830 Tau and we attribute it to the presence of a planet of mass 
0.77 times that of Jupiter, orbiting at a distance of 0.057 astronomical 
units from the host star. Our result demonstrates that hot Jupiters 
can migrate inwards in less than two million years, probably as a 
result of planet-disk interactions”. 

Very few exoplanets have yet been discovered around young, form- 
ing Sun-like stars aged less than ten million years (Myr)”*—called the 
T Tauri stars—either through the radial velocity variations or the pho- 
tometric transits they induce in the light of their stars. Yet detections of 
young planets are key for our understanding of how planetary systems 
form and this is especially true of young hot Jupiters, which are thought 
to have a critical impact on the early architecture of these systems. 
The first claimed detection of a young hot Jupiter* orbiting a T Tauri 
star was quickly refuted; the reported periodic radial velocity fluctua- 
tions were finally attributed to activity’ and to cool spots at the stellar 
surface’. The recent candidate detection of a transiting hot Jupiter 
around a T Tauri star‘ is still pending confirmation. 
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outside the image mark the phases of observations. b, Observed 

(black line) and modelled (red) LSD profiles of V830 Tau throughout 
our November-December 2015 run. LSD profiles before their filtering 
from lunar contamination are also shown (cyan lines). Numbers on the 
right of each profile indicate the rotation cycle. Cycles 6.079 to 7.190 are 
the most affected, contamination being much smaller or negligible in all 
other observations. I/I,, normalized intensity. 
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T Tauri stars are known to harbour cool spots and bright features 
(plages) on their surfaces, generating radial velocity fluctuations 
with semi-amplitudes of several kilometres per second!”, that is, 
much larger than the perturbations expected from a putative planet, 
even for close-in massive hot Jupiters inducing typical radial velocity 
signals of ~0.1km s~!. Detecting hot Jupiters around T Tauri stars 
through velocimetry or photometry is thus quite challenging and 
requires efficient tools for filtering out the dominant jitter that activ- 
ity induces in the spectra and light curves of young stars. We recently 
proposed a new method to achieve this goal!!, whose first applications 
to non-accreting (weak-line) T Tauri stars proved promising though 
inconclusive’? 

V830 Tau is a ~2-Myr-old solar-mass T Tauri star!” contracting 
towards the main sequence and currently spinning once in 2.741 days’, 
that is, ~10 times faster than the Sun. Evolutionary models’ suggest 
that it is fully or largely convective. Unlike 80% of the T Tauri stars 
in the Taurus star-forming region’’, V830 Tau exhibits no significant 
infrared excess, implying that most of its inner accretion disk has 
already dissipated. This is consistent with its status as a non-accreting 
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weak-line T Tauri star, and makes it an ideal place to look for the pres- 
ence of hot Jupiters at an early stage of star and planet formation. 

In late 2015, we collected 48 high-resolution spectra of V830 Tau (see 
Extended Data Table 1) as part of the MaT YSSE Large Program aimed 
at detecting hot Jupiters around weak-line T Tauri stars!. Applying 
least-squares deconvolution’ (LSD) to our spectra, we derived accurate 
average line profiles and their temporal modulation over ~15 rotation 
cycles. Longitudinal magnetic fields were also derived from our circu- 
larly polarized data and the Zeeman signatures that fields generate in 
spectral lines!®. Using tomographic techniques inspired from medical 
imaging, one can reconstruct distributions of spots and plages at the 
surfaces of rotating cool active stars from sets of densely sampled line 
profiles covering several rotation cycles. This method, called Doppler 
imaging!’, can also probe the photospheric shear associated with 
surface differential rotation through the amount of twisting it generates 
in brightness maps'®!’. 

Our Doppler imaging code was previously applied to a small set of 
15 LSD profiles of V830 Tau, from which the distribution of surface 
features and the differential rotation pattern were recovered!’; it even 
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Figure 2 | Raw, filtered and residual radial velocities of V830 Tau. 

a, Top, raw radial velocities of V830 Tau (open symbols and 1¢ error bars) 
and the model inferred with Doppler imaging (cyan line); the model 
slowly evolves with time as a result of differential rotation. Open circles, 
squares and triangles depict ESPaDOnS, NARVAL and ESPaDOnS/GRACES 
data. Middle, activity-filtered radial velocities and best sine fit to the data 
(cyan line). The period and semi-amplitude of the planet radial velocity 
signal are equal to 4.93 + 0.05 days and 75+ 11 ms! (1c error bars). 


Bottom, residual radial velocities once the planet signal is removed and 
activity is filtered out, with a final root-mean-square dispersion of 48m s_'. 
Different colours code different rotation cycles. b, Activity-filtered radial 
velocities (with 1o error bars) phase-folded on the planet orbital period 
of 4.93 days. Although the fit to the data is marginally better with an 
eccentric orbit (dashed line) than with a circular orbit (solid line), the 
significance of the derived eccentricity (0.30 + 0.15) is too low to be 
reliable”’. 
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Figure 3 | Periodogram of the radial velocities and activity of V830 Tau. 
a, Periodograms of the raw (top), filtered (middle) and residual (bottom) 
radial velocities shown in Fig. 2a. The black line is for the full set, while 
the dashed red, green, blue and pink lines are for the first half, the 

second half, the even points and the odd points only; the purple dashed 
lines are the periodograms once the 15 radial velocity points affected by 
lunar contamination are removed and the orange dashed lines are the 
periodograms once the 6 radial velocity points strongly affected by lunar 
contamination are removed. (Similar results are obtained when analysing 
subsets of filtered radial velocities from a global Doppler imaging 


suggested the potential presence of a hot Jupiter, though with a very low 
confidence level. The version of the Doppler imaging code used here 
implements a novel technique to filter out lunar contamination (which 
plagues spectra collected in non-photometric conditions), yielding 
good results when phase coverage is dense. Applying the code to our 
new set of 48 LSD profiles of V830 Tau yields the map and fit shown in 
Fig. 1. We again clearly detect differential rotation (see Extended Data 
Fig. 1) and confirm that it is ~3 times weaker than that of the Sun, 
reflecting that V830 Tau is largely or fully convective”. 

From the brightness image reconstructed with Doppler imaging, we 
derive the model radial velocity curve that V830 Tau should exhibit 
if all profile perturbations were attributable to surface features and 
differential rotation. By subtracting these modelled radial velocities 
from the observed ones (both computed as the first moment of the LSD 
profiles), we obtain the activity-filtered radial velocities of V830 Tau, 
whose amplitude is typically ~10 times lower than the raw radial 
velocities (see Fig. 2). A clear radial velocity signal, with a period of 
4.93 + 0.05 days and a semi-amplitude of 75+ 11m s |, is detected in 
the activity-filtered radial velocities at a confidence level > 99.97% (see 
Figs 2 and 3). Using a Bayesian approach”, we find that the 4.93-day 
peak is at least 10° times more likely than any of the other features in the 
periodogram (see Extended Data Fig. 2a). The regular phase coverage 
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modelling, or when Doppler imaging filtering is applied to subsets of LSD 
profiles). The stellar rotation period (2.741 days), its first harmonic and 
the planet orbital period (4.93 days) are depicted with vertical dashed 
lines. The horizontal dotted and dashed lines trace the 33%, 10%, 3% and 
1% false alarm probabilities (FAP). The planet signal in the filtered radial 
velocities is detected in the full set with a FAP < 0.03%. b, Periodogram of 
the line-of-sight projected (longitudinal) magnetic field, a reliable activity 
proxy’, featuring a clear peak at the stellar rotation period but no power at 
the planet orbital period. 


of our data also allows us to check that the 4.93-day signal is present 
in smaller subsets (for example, first and second half, even and odd 
points) though, as expected, with a lower confidence level. Similarly, 
we checked that our detection holds when profiles affected by lunar 
contamination are excluded (see Fig. 3 and Extended Data Fig. 2a) and 
when differential rotation is neglected. Periodograms of the longitudi- 
nal fields and of the Ha emission, both reliable proxies for the activity 
jitter plaguing radial velocity curves”, show no power at a period of 
4.93 days (see Fig. 3b and Extended Data Fig. 2b), demonstrating that 
the signal we report is unrelated to activity. 

We interpret this radial velocity signal as being caused by a giant 
planet of mass 0.77 + 0.15 Jupiter masses in a circular orbit around 
V830 Tau at a distance of 0.057 + 0.001 Au (see Extended Data Table 2). 
Although the filtered radial velocities are marginally better fitted for 
an eccentric orbit (e=0.30 + 0.15, see Fig. 2), we still favour a circu- 
lar orbit given the large error bar on the eccentricity*?, Removing the 
planet signal from the original data and repeating the activity-filter- 
ing process yields residual radial velocities with a root-mean-square 
dispersion of 48m s_| (that is, consistent with the average noise level, 
see Extended Data Table 1) and no peak with a FAP <30% left in the 
periodogram. Using the alternative option of fitting both the brightness 
map and the planet parameters simultaneously‘ yields identical results 
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Figure 4 | Adjusting the planet parameters while modelling activity. 
Variations in the reduced 7 (shown as the colour scale) of the Doppler 
imaging fit to the LSD profiles for a given level of spottedness and 
assuming the presence of a planet in circular orbit, for a range of orbital 
periods P,,, and semi-amplitudes K of the radial velocity signature. 

(This is a two-dimensional cut from a three-dimensional map, with the 
phase of the radial velocity signal also included as a search parameter). 
The location of the minimum and local paraboloid curvature yield the 
optimal values of P,,, and K and their respective 1o error bars”4, equal to 
4.94+ 0.05 days and 82+ 10m s“!, in agreement with the results of our 
main filtering technique. The outer colour contour traces the projected 
99.99% confidence interval, corresponding to a y? increase of 21.1 for 

a three-parameter fit to the 2,208 data points of the LSD profiles. With 
respect to our best model incorporating a planet, a model with no planet 
corresponds to a y” increase of 82, implying that the planet is detected with 
a FAP <10~'°, much smaller than the FAP derived from the periodogram 
of the 48 radial velocity points (see Fig. 3) thanks to the larger number of 
data points in the fitted LSD profiles. 


and demonstrates that our optimal model including a planet is orders 
of magnitude more likely than a model with no planet (see Fig. 4). 
Simulations in conditions identical to those of our observations yield 
results in close agreement with those of Figs 2 and 3 (see Extended 
Data Figs 3 and 4), further demonstrating that our filtering process 
induces no spurious peak and that the radial velocity signal we detect 
is unrelated to activity. 

A careful re-analysis of our original data’? demonstrates that, despite 
being affected by intrinsic variability from the host star, they none- 
theless confirm the presence of the planet signal detected in our new 
data. In particular, applying our filtering analysis to the main subset 
of our original data (featuring dense enough coverage for our tech- 
nique to perform reliably) yields filtered radial velocities agreeing well 
with those derived from the new data. Fitting both sets together fur- 
ther improves the confidence level at which the planet is detected (see 
Extended Data Fig. 2c). 

The detection we report among the small sample of weak-line 
T Tauri stars (~10) already studied with MaTYSSE suggests that 
close-in giant planets are potentially more frequent around T Tauri stars 
than around mature low-mass stars, ~1% of which are known to host 
hot Jupiters*>”°. Poisson statistics, however, indicate that there is still 
a ~10% chance that hot Jupiters are similarly frequent for both popu- 
lations. A more quantitative conclusion will have to await a thorough 
analysis of all MaTYSSE data, complemented by the large T Tauri star 
survey to be carried out with SPIRou, the new near-infrared cryogenic 
spectropolarimeter and velocimeter to be installed at CFHT in 2017. 

The broad consensus is that hot Jupiters form beyond a few astro- 
nomical units from their host stars and migrate inwards to their 
eventual close-in orbits. The delivery of hot Jupiters to these orbits 
may result from planet-disk interactions” (disk migration) or from 
dynamical interactions with a planetary? or a stellar companion, fol- 
lowed by orbital circularisation (high-eccentricity migration). These 
highly debated migration channels were proposed as tentative explana- 
tions for hot Jupiters with orbits either well aligned or misaligned with 
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the spin axis of their host stars”’. Our detection of a hot Jupiter on a 
~5-day circular (or moderately eccentric) orbit around a 2-Myr-old 
star is most naturally explained by the mechanism of hot Jupiter deliv- 
ery by disk migration, which produces hot Jupiters with nearly circular 
orbits, rather than by planet-planet scattering, which generates highly 
eccentric (e > 0.9) hot Jupiters whose circularization timescales are at 
least 100 to 1,000 times longer”* than the age of V830 Tau given typical 
tidal dissipation factors in giant planets”. Our result thus yields strong 
support to the theory of giant planet migration in gaseous protoplane- 
tary disks” and confirms that the architecture of planetary systems, on 
which hot Jupiters have a strong impact, is probably dynamic from the 
very early stages of planetary formation. 

Global models of planet formation and evolution show that giant 
planets can reach a mass and an orbital period similar to those of 
V830 Tau b in 2-3 Myr, whether formation occurs through core accre- 
tion or disk gravitational instability. Large uncertainties in these models 
make it difficult to accurately predict the occurrence rate of hot Jupiters 
and thus to associate V830 Tau b with either formation scenario. The 
~300 G dipole of the star’s magnetic field’? is strong enough to have 
disrupted the central 0.06 au of the now-dissipated disk for accretion 
rates <2 x 10-'°Mo yr_!. For rates of ~10~°Mo yr_', more typical 
of those of the classical T Tauri stars that still feed from their disks, 
a > 700 G dipole field is required, which is again compatible with obser- 
vations of classical T Tauri stars similar to V830 Tau)". This shows 
that the field of V830 Tau may well have stopped the planet within the 
magnetospheric gap’ at the end of its disk migration, and saved it from 
falling into the star. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Spectropolarimetry with ESPaDOnS/NARVAL and the MaTYSSE programme. 
ESPaDOnS*! and NARVAL are twin spectropolarimeters installed at the Cassegrain 
focus of the 3.6-m CFHT on top of Maunakea (Hawaii), and of the 2-m Bernard 
Lyot Telescope on top of Pic du Midi (France), respectively. Both include a fibre- 
fed bench-mounted high-resolution spectrograph, yielding full coverage of the 
370-1,000 nm wavelength range in a single exposure at a spectral resolving power 
of 65,000. ESPaDOnsS can also be fed from the 8-m Gemini-N Telescope next to 
CFHT, through a 300-m fibre link called GRACES”, yielding spectra with either 
similar resolving power (in star-only mode) or half of it (in star + sky mode). In this 
run, we secured 16 spectra with ESPaDOnS from 2015 November 17 to December 
02 in fair-weather conditions, 16 spectra with NARVAL from 2015 November 10 
to December 17 in moderate to good weather, and 16 spectra with EsPaDOnS/ 
GRACES on four different nights towards the end of the run (with 2 spectra in 
star-only mode and 14 spectra in star + sky mode). The full journal of observations 
is given in Extended Data Table 1. ESPaDOnS and NARVAL observations were 
secured in spectropolarimetric mode (circular polarization) in the framework of 
the MaTYSSE (MAgnetic Topologies of Young Stars and the Survival of close-in 
giant Exoplanets)!! Large Programme, whereas ESPaDOnS/GRACES observations 
were collected in Director Discretionary Time in spectroscopic mode (no pola- 
rimetric unit on Gemini-N). All spectra were derived from raw frames with the 
reference pipeline implementing optimal extraction and radial velocity correction 
from telluric lines"®, yielding a typical root-mean-square radial velocity precision 
of 30 ms°! (ref. 33). 

Deriving mean line profiles with LSD. LSD"* is a multiline technique similar to 
cross-correlation, used to derive line profiles with enhanced signal-to-noise ratio 
S/N from thousands of spectral lines simultaneously. For this study, the line list we 
used for LSD is derived from spectrum synthesis through model atmospheres com- 
puted assuming local thermodynamic equilibrium™, for atmospheric parameters 
relevant for V830 Tau’? (effective temperature of 4,250 K, logarithmic gravity of 
4.0 and solar metallicity). Resulting S/N in LSD profiles are in the range 950-1,540 
(see the journal of observations in Extended Data Table 1), corresponding to aver- 
age multiplex gains in S/N of ~10. LSD was also applied to circularly polarized 
spectra to retrieve average Zeeman signatures and longitudinal field estimates!®. 
Doppler imaging of stellar surfaces and the modelling of differential rotation. 
Doppler imaging is a tomographic technique inspired from medical imaging, with 
which distributions of brightness features and magnetic fields at the surfaces of 
rotating stars can be reconstructed from time series of high-resolution spectropo- 
larimetric observations. Doppler imaging is based on the fact that, thanks to the 
Doppler effect, line profiles of rotating stars can be interpreted as one-dimensional 
images of stellar surfaces, resolved in the Doppler direction but otherwise blurred. 
By coupling many such one-dimensional images recorded at different rotation 
phases, one can reliably reconstruct the parent surface distribution that gives 
rise to the observed line profiles and rotational modulation. First introduced in 
the late 1980s'’, Doppler imaging has been extensively used to investigate, with 
unprecedented accuracy, surface features and magnetic fields in cool stars other 
than the Sun**”°, Technically speaking, Doppler imaging follows the principles of 
maximum-entropy image reconstruction, and iteratively looks for the image with 
lowest information content that fits the data at a given y’ level. For more details 
on the imaging process, our previous MaTYSSE studies!” give a detailed account 
of all modelling steps. In this Letter, we carried out Doppler imaging modelling 
using either unpolarized (Stokes I) spectra only, or both unpolarized and circu- 
larly polarized (Stokes V) spectra simultaneously, with identical results regarding 
filtering performances and the extraction of radial velocity signals. 

By looking at how surface maps get twisted as a function of time, Doppler imag- 
ing can also estimate the amount of latitudinal differential rotation shearing stellar 
photospheres'*!”. In this study we assume a typical solar-like differential rotation 
law in which the surface rotation rate varies with latitude 0 as sin?0, and depends 
on two main parameters, the rotation rate at the equator Q., and the difference in 
rotation rate between the equator and the pole, dQ. Both parameters are derived 
by looking for the pair that minimizes the \? of the fit to the data (at constant 
information content in the reconstructed image, see Extended Data Fig. 1), whereas 
the corresponding error bars are computed from the curvature of the y” paraboloid 
at its minimum’. Although helpful to achieve a more accurate description of the 
activity jitter and a cleaner filtering of raw radial velocity curves (at periods Prot 
and P,,/2 in particular), differential rotation as weak as that of V830 Tau has little 
impact on the filtered radial velocities; similar conclusions regarding the planet 
signal are obtained when assuming that V830 Tau is rotating as a solid body. 
A similar Doppler-imaging-based technique can be used to diagnose the presence 
of hot Jupiters around active stars”, with the planet parameters replacing those 
describing differential rotation. This alternate method yields identical results to 
those presented here for our data set (see Fig. 4). 
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Filtering LSD profiles from lunar contamination. Spectra recorded in non- 
photometric conditions near full-moon epochs are often contaminated by solar 
light reflected off the moon and diffused by clouds. To filter out this pollution, 
whose location and width is well known at any given epoch but whose strength 
we want to determine, we implement a dual-step Doppler imaging process. The 
first step consists in applying Doppler imaging to the set of original LSD profiles, 
with scaled-up error bars for all pixels potentially affected by lunar contamination; 
the strength of the lunar contamination is then measured with a Gaussian fit to 
the residuals, and subtracted from the polluted LSD profiles. In the second step, 
conventional Doppler imaging is applied to the set of filtered LSD profiles with 
original error bars. This dual-step process is found to be very efficient when applied 
to densely sampled data sets like the one presented here, in which profiles at similar 
phases but different rotation cycles provide a strong constraint on the strength of 
lunar pollution. In our data, 6 LSD profiles suffer from a strong pollution (rotation 
cycles 6.0 to 7.6, see Fig. 1), whereas 9 others are affected at a much weaker level. 

If we exclude the 6 strongly moon-polluted profiles (or all 15 moon-polluted 

profiles) from our data set, the radial velocity signal from V830 Tau b is still clearly 
detected, though with a lower confidence rate of 99.9% (or 99%), reflecting the 
poorer temporal coverage and the degraded window function (see Fig. 3a and 
Extended Data Fig. 2a). This check shows that our decontamination process is 
successful at restoring the original profile distortions and at retaining the radial 
velocity content, provided the data set is dense enough. 
Revisiting the original data set from 2014 December and 2015 January. A careful 
re-analysis of our original data (consisting of two subsets shifted in time by 
17 days!”, see Extended Data Table 3) indicates that variability occurred at the 
surface of the star between the two subsets. Whereas Doppler imaging succeeds at 
adjusting the main subset (9 evenly spaced points secured in 2015 January) down to 
the noise level, as for our new data, fitting both subsets together requires us to lower 
S/N values in the 2014 December subset by ~15% in order to reach unit reduced 7. 
This is definite evidence that intrinsic variability (beyond pure differential rotation) 
occurred at the surface of V830 Tau between both subsets, with profile 6 of the 
2014 December subset being the most affected. This variability reflects a modi- 
fication in the brightness map, subtle enough to affect Doppler imaging no more 
than moderately, yet large enough to substantially affect filtered radial velocities, 
which are quite sensitive even to small features in the brightness map. It illustrates 
how tricky activity filtering can get when dealing with intrinsic variability, and how 
critical dense and even phase coverage is to diagnose it efficiently. 

As a result of this variability, our filtering analysis can only be applied to the 
individual subsets of our original data, and in fact to no more than the 2015 January 
subset, the other being far too sparse and uneven for the technique to perform reli- 
ably. We find that the filtered radial velocities from the main subset (see Extended 
Data Table 3) agree with those of our new data; fitting them together improves the 
confidence level at which the planet is detected, but not the accuracy on the orbital 
period (see Extended Data Fig. 2c). 

Enabling us to shortcut the computation of filtered radial velocities, the 
Doppler-imaging-based method of adjusting the planet parameters simultaneously 
with the distribution of surface features” offers an alternative way to confirm that 
the radial velocity signal from the detected planet is present in our original data. 
(This method however still suffers from the inability of Doppler imaging to describe 
intrinsic variability beyond differential rotation). Freezing the planet orbital period 
to the value found in our new analysis (4.93 days) and applying this technique 
to the full set of our original data with only profile 6 removed, we find a semi- 
amplitude of 67+ 18m s_' for the planet radial velocity signature, which agrees 
well with the measurement derived from our main study. We also confirm that 
our original data are better explained by a model including a planet in a 4.93-day 
orbit than by one with no planet, with a false-alarm probability of ~0.01% (corre- 
sponding to a y” increase of 19 for the 644 data points of the fitted LSD profiles). 
Code availability. The Doppler imaging code used for this study is as yet undoc- 
umented and has thus not been released in the public domain. 
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differential rotation parameters 0.4 and dQ, denoting respectively the day. The outer colour contour traces the 99.99% confidence interval 
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Extended Data Figure 2 | Complementary periodograms. a, Same as 
Fig. 3a (middle plot) using the BGLS approach”', showing that the 4.93-day 
peak we detect is at least 10° times more likely than any other features. 

b, Same as Fig. 3b for Ha emission, another activity proxy, featuring a 
clear peak at the stellar rotation period but no power at the planet orbital 
period. c, Same as Fig. 3a (middle plot) for our new data combined with 
the 2015 January subsample of our original data’. The planet is now 


detected with a higher confidence level (FAP < 10°) but the accuracy of 
the orbital period is not much improved (with multiple nearby peaks of 
similar strength). The red and green dashed lines are for the original and 
new data respectively. d, Same as c, zooming on the orbital frequency. 
The orbital period corresponding to the strongest peak is equal to 

4.924 + 0.004 days (1a error bar). 
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Extended Data Figure 3 | Periodograms of simulated data. a, Same as 
Fig. 3, for simulated data computed using the brightness map, differential 
rotation and planet parameters inferred from the real data, and assuming 
the same coverage and similar S/N (equal for all LSD profiles). As for our 
observations, the planet signal is detected at a confidence level >99.9% 

in the filtered radial velocities despite being invisible in the raw radial 
velocities, and the planet parameters are well recovered. The periodogram 
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of the raw radial velocities is very similar to that of Fig. 3, featuring the 


particular at a period of 4.93 days. 
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main peaks (at Pot and P,o:/2) and their aliases; residual radial velocities 
mostly reflect the noise in the data. b, Same as the upper two panels of a 
but with no planet included in the simulation. No signal with a confidence 
level >90% is recovered in the filtered radial velocities, demonstrating that 
the filtering process is not generating spurious radial velocity signals, in 
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Fig. 3a (using the same brightness map, differential rotation and planet As for the periodogram (see Extended Data Fig. 3b, bottom panel), no 
parameters as those of V830 Tau b). The simulated radial velocities signal is detected, further illustrating that activity induces no spurious 
(periodograms shown in Extended Data Fig. 3a) share obvious similarities planet signature. As for Fig. 2, simulated radial velocity measurements are 
with the observed ones, and the planet signal is safely recovered depicted in all panels with their 1o error bars. 
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Extended Data Table 1 | Journal of observations 


UT date instrument BUD R texp S/N S/Nisp rotcycler orbcycleo rawRV_ filtRV RAVerr 

(2015) (2457300+) (K)  (s) (118+) (5-) (km/s) (km/s) (km/s) 

Nov 11. | NARVAL 37.5066 65 4800 90 1149 0.828 0.332 -1.362 -0.150 0.067 
Nov 12 |NARVAL 38.6187 65 4800 100 1251 1.233 0.557 0.365 0.074 0.061 
Nov 13 | NARVAL 39.5077. 65 4800 103 1278 1.558 0.738 0.960 0.022 0.059 
Nov 14 |NARVAL 40.5427 65 4800 95 1233 1.935 0.948 -0.828 -0.035 0.062 
Nov 15 |NARVAL 41.6051 65 4800 89 1152 2.323 1.163 0.504 0.005 0.067 
Nov 16 |NARVAL 42.6875 65 4800 105 1257 2.700 1.373 0.137 -0.036 0.060 
Nov 17 | NARVAL 43.5347 65 4800 84. 1113 3.027 1.555 -0.151 0.031 0.070 
Nov 18 |ESPaDOnS 44.9422 65 2780 162 1420 3.540 1.840 0.830 -0.016 0.048 
Nov 21 |ESPaDOnS 48.0862 65 2780 168 1380 4.687 2.478 0.382 0.031 0.050 
Nov 22 |ESPaDOnS 48.9770 65 2780 170 1428 5.012 2.658 -0.199 0.060 0.048 
Nov 23 |ESPaDOnS 49.9011 65 2780 169 1428 5.350 2.846 0.489 0.087 0.048 
Nov 24 |ESPaDOnS 50.9411 65 2780 158 1396 5.729 3.057 0.317 -0.041 0.050 
Nov 25 |ESPaDOnS 51.9015 65 2780 154 1482 6.079 3.252 0.312 -0.120 0.048 
Nov 25 |ESPaDOnS 51.9372 65 2780 164 1480 6.092 3.259 -0.243 = -0.040 0.047 
Nov 26 |ESPaDOnS 52.8917 65 2780 164 1539 6.441 3.452 0.089 = -0.126 0.047 
Nov 27 |ESPaDOnS 53.9535 65 2780 160 1439 6.828 3.668 -1.207 -0.016 0.049 
Nov 28 |ESPaDOnS 54.9089 65 2780 144 1458 7.177 3.862 -0.014 0.018 0.047 
Nov 28 |ESPaDOnS 54.9449 65 2780 131 1453 7.190 3.869 0.020 -0.028 0.048 
Nov 29 |ESPaDOnS 55.8153 65 2780 166 1434 7.507 4.045 0.648 0.029 0.048 
Nov 30 |ESPaDOnS 56.9632 65 2780 175 1436 7.926 4.278 0.992 -0.123 0.048 
Dec 01 |ESPaDOnS 57.8164 65 2780 167 1447 8.237 4.451 0.393 0.014 0.047 
Dec 02 |NARVAL 58.5216 65 4800 89 1131 8.495 4.594 0.602 0.078 0.069 
Dec 02 |ESPaDOnS 58.9744 65 2780 168 1428 8.660 4.686 0.823 0.113 0.049 
Dec 02 |NARVAL 59.4928 65 4800 82 1089 8.849 4.791 -1.054 0.148 0.073 
Dec 03 | ESPaDOnS 60.0148 65 2780 176 1458 9.039 4.897 -0.163 0.073 0.047 
Dec 04 |NARVAL 60.5247 65 4800 90 1191 9.225 5.001 0.328 0.019 0.064 
Dec 07 |NARVAL 64.5183 65 4800 108 1221 10.682 5.811 0.568 0.151 0.064 
Dec 09 |NARVAL 66.4571 65 4800 89 1161 11.390 6.204 0.155 -0.068 0.066 
Dec 11 | NARVAL 68.4771 65 4800 88 1218 12.127 6.614 -0.148 0.103 0.064 
Dec 12 |NARVAL 69.4897. 65 4800 69 978 12.496 6.819 0.578 0.024 0.083 
Dec 13 | NARVAL 70.5011 65 4800 104 1335 12.865 7.024 -1.107 0.055 0.057 
Dec 15 |ESPaDOnS / GRACES 71.9290 35 540 218 1385 13.386 7.314 0.178 -0.048 0.068 
Dec 15 |ESPaDOnS / GRACES 71.9359 35 540 214 1388 13.389 7.315 0.148 -0.068 0.068 
Dec 17 |NARVAL 74.4815 65 4800 90 1257 14.317 7.832 0.617 0.080 0.060 
Dec 18 |ESPaDOnS / GRACES 74.7548 35 85 72 974 14.417 7.887 0.222 0.058 0.073 
Dec 18 |ESPaDOnS / GRACES 74.7565 35 85 73 992 14.418 7.888 0.203 0.039 0.072 
Dec 18 |ESPaDOnS / GRACES 74.7581 35 85 78 990 14.418 7.888 0.222 0.057 0.072 
Dec 18 |ESPaDOnS / GRACES 74.7597 35 85 77 996 14.419 7.888 0.242 0.078 0.071 
Dec 18 |ESPaDOnS / GRACES 75.0425 65 360 123 1396 14.522 7.946 0.788 0.016 0.045 
Dec 18 |ESPaDOnS / GRACES 75.0474 65 360 107 1369 14.524 7.947 0.840 0.053 0.046 
Dec 21 |ESPaDOnS / GRACES 77.7761 35 85 75 990 15.519 8.500 0.792 0.037 0.072 
Dec 21 |ESPaDOnS / GRACES 77.7777 —- 35 85 74 986 15.520 8.500 0.766 0.006 0.072 
Dec 21 |ESPaDOnS / GRACES 77.7793 = 35 85 74 982 15.520 8.501 0.729 =-0.035 0.073 
Dec 21 |ESPaDOnS / GRACES 77.7810 35 85 74 976 15.521 8.501 0.819 0.050 0.073 
Dec 22 |ESPaDOnS / GRACES 78.8110 35 85 58 946 15.897 8.710 -0.864 0.161 0.076 
Dec 22 |ESPaDOnS/ GRACES 78.8126 35 85 64 956 15.897 8.710 -0.928 0.093 0.076 
Dec 22 |ESPaDOnS / GRACES 78.8143 35 85 61 964 15.898 8.711 -0.880 0.139 0.074 
Dec 22 |ESPaDOnS / GRACES 78.8159 35 85 62 958 15.899 8.711 -0.929 0.087 0.075 


Rotation and orbital cycles rand o are respectively given by the ephemeris BJD = 2,457,011.8 + 2.741r and 2,457,360.52 + 4.930. For each observation, R, S/N and S/Nsp list the resolving power, the 
S/N in the raw spectrum and the S/N in the LSD profile, whereas the last three columns list the raw and filtered radial velocities (RV) and the corresponding 1c error bars (reflecting both the photon 
noise and the instrumental radial velocity precision). The first and fifth column respectively list the observing dates in universal time (UT) and the corresponding exposure time texp. 
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Extended Data Table 2 | Main parameters of the planet and of the host star 


orbital period K BUD of transit orbital distance (au) Mpianet Sin i Mplanet 

(d) (m/s) (Ma) (M2) 

4.93+0.05 75411 | 2457360.52+0.10 0.057+0.001 0.63+0.11 0.77+0.15 

Mstar Retar age (Myr) Terr log(L/Lo) Prot vsini i distance 
(Mo) (Ro) (K) (d) (km/s) (*) (pc) 
1.00+0.05 2.040.2 ~2 4250+50 0.08+0.10 2.741 30.540.5 55+10 13143 


Top, main parameters (with 1c error bars) of V830 Tau b assuming a circular orbit. Bottom, main parameters (with lo error bars) of the host star V830 Tau!?, with Te denoting the effective tempera- 
ture of the photosphere. 
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Extended Data Table 3 | Journal of observations for the original data 


UT date instrument BJD R texp S/N S/NispD rotcycler raw RV fit RV RVerr 

(2014-2015) (2457000+) (K)  (s) (km/s) (km/s) (km/s) 

Dec 20 ESPaDOnS 11.8899 65 2800 170 1501 0.033 0.721 0.049 
Dec 21 ESPaDOnS 12.8622 65 2800 170 1504 0.388 0.197 0.050 
Dec 22 ESPaDOnS 13.9010 65 2800 180 1520 0.767 -0.513 0.049 
Dec 28 ESPaDOnS 20.0190 65 2800 140 1498 2.999 0.446 0.049 
Dec 29 ESPaDOnS 20.8759 65 2800 160 1478 3.311 -0.137 0.050 
Dec 30 ESPaDOnS 21.8154 65 2800 160 1478 3.654 0.240 0.050 
Jan 07 ESPaDOnS 29.8629 65 2800 170 1498 6.590 -0.044 0.031 0.049 
Jan 08 ESPaDOnS 30.8217 65 2800 180 1490 6.940 -0.206 -0.031 0.049 
Jan 09 ESPaDOnS 31.8181 65 2800 170 1523 7.303 -0.151 -0.062 0.051 
Jan 10 ESPaDOnS 32.8186 65 2800 150 1476 7.668 0.044 -0.011 0.050 
Jan 11 ESPaDOnS 33.8669 65 2800 180 1495 8.051 1.085 0.076 0.050 
Jan 12 ESPaDOnS 34.7215 65 2800 170 1473 8.362 0.246 0.073 0.050 
Jan 13 ESPaDOnS 35.7150 65 2800 160 1501 8.725 -0.264 -0.001 0.050 
Jan 14 ESPaDOnS 36.7141 65 2800 170 1501 9.089 0.929 -0.067 0.050 
Jan 15 ESPaDOnS 37.8043 65 2800 170 1470 9.487 -0.202 -0.052 0.050 


Same as Extended Data Table 1 for the 2014 December and 2015 January data!?. Our filtering technique is found to be reliable only for data sets with dense and regular phase coverage, hence the 
absence of filtered radial velocities for the 2014 December subset of 2 x 3 points, which does not satisfy this requirement. Filtered radial velocities from 2015 January 07-15 agree well with those 
derived from our new data (see Extended Data Table 1). 
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Tunable two-dimensional arrays of single Rydberg 
atoms for realizing quantum Ising models 


Henning Labuhn!*, Daniel Barredo!*, Sylvain Ravets!, Sylvain de Léséleuc!, Tommaso Macri*, Thierry Lahaye! & Antoine Browaeys! 


Spin models are the prime example of simplified many-body 
Hamiltonians used to model complex, strongly correlated real- 
world materials’. However, despite the simplified character of such 
models, their dynamics often cannot be simulated exactly on classical 
computers when the number of particles exceeds a few tens. For this 
reason, quantum simulation’ of spin Hamiltonians using the tools of 
atomic and molecular physics has become a very active field over the 
past years, using ultracold atoms? or molecules‘ in optical lattices, 
or trapped ions”. All of these approaches have their own strengths 
and limitations. Here we report an alternative platform for the study 
of spin systems, using individual atoms trapped in tunable two- 
dimensional arrays of optical microtraps with arbitrary geometries, 
where filling fractions range from 60 to 100 per cent. When excited to 
high-energy Rydberg D states, the atoms undergo strong interactions 
whose anisotropic character opens the way to simulating exotic 
matter®. We illustrate the versatility of our system by studying the 
dynamics of a quantum Ising-like spin-1/2 system in a transverse 
field with up to 30 spins, for a variety of geometries in one and 
two dimensions, and for a wide range of interaction strengths. For 
geometries where the anisotropy is expected to have small effects on 
the dynamics, we find excellent agreement with ab initio simulations 
of the spin-1/2 system, while for strongly anisotropic situations the 
multilevel structure of the D states has a measurable influence”*. 


Our findings establish arrays of single Rydberg atoms as a versatile 
platform for the study of quantum magnetism. 

Rydberg atoms have recently attracted a lot of interest for quantum 
information processing? and quantum simulation!”. In this work, we 
use individual Rydberg atoms to realize quantum Ising magnets, with 
unprecedented flexibility in the geometry of the arrays. By shining 
on the atoms lasers that are resonant with the transition between the 
ground state |g) and a chosen Rydberg state |r) (Fig. 1a), we implement 
the Ising-like Hamiltonian 
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i<j 


(1) 


which acts on the pseudo-spin states | |); and |); corresponding to 
states |g) and |r) of atom i, respectively. Here, (7 is the Rabi frequency 
of the laser coupling, the oi, (a=x, y, z) are the Pauli matrices acting 
on atom i, and n'=(1+ o') /2 is the number of Rydberg excitations 
(0 or 1) on site i. The term Vj arises from the van der Waals interaction 
between atoms i and j when they are both in |r), and scales as 
Co(0) |r — rj| ~6 with the separation between the atoms r;—1;. Moreover, 
for |r) = |nD3)2, mj = 3/2), the van der Waals coefficient Cg is 
anisotropic”"'!, varying by ~3 when the angle 6 between the intera- 
tomic axis and the quantization axis Z changes from 0 to 1/2 (Fig. la 


a - b R, Figure 1 | Many-body dynamics of Rydberg 
e° atoms and experimental platform. a, Rydberg 
0 4 . @) blockade between two atoms (main panel, 
R > 3 °. lower right), each considered as a two-level 
we HH, a2 ran system (grey inset): owing to strong interactions 
ae \ & 1 .®, e°e@ between the Rydberg states |r), the excitation of 
ee % ‘ 0 * ov ° ov two nearby atoms (within the blockade radius R,) 
is _ -50 a 50 @.° @.° is inhibited. The use of nD3, states for |r) 
nD3/p _— zr) \t) 3 C,(0) / R® " gives rise, when the description of the atoms is 
—— vA ir = ® * e° reduced to a two-level model, to an anisotropic 
fs GAS alae \ Itt) @ @Y ° ° / — effective van der Waals potential C¢(9)/R° 
5Pijp __ a Va cle re’ -e° e, (see inset). b, For a value of R, comparable to 
795 nm t f the distance between adjacent atoms (top), 
—\g) Sy Nt) |) Ge -°e the dynamics becomes richer. Configurations 
584). Q | ! e @ x © ex where two neighbouring atoms are excited are 
R lh) R ee? a energetically forbidden (red crosses), yielding 
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strong correlations between the Rydberg 
excitations in the allowed configurations (green 
ticks). c, An array of microtraps is created by 
imprinting an appropriate phase on a dipole- 
trap beam. Site-resolved fluorescence of the 
atoms, at 780 nm, is imaged on an electron- 
multiplying charge-coupled device (EMCCD) 
camera using a dichroic mirror (DM). Rydberg 
excitation beams at 795 and 475 nm are shone 
onto the atoms. Inset, measured light intensity 
for an array of N.= 19 traps. 
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Figure 2 | Collective oscillations in the full Rydberg blockade regime. 

a, Probability Po for all N atoms to be in |g) after an excitation pulse of area 
927, for five values of N from 1 to 15. Red points, fully loaded arrays, 

n= 82; blue points, partially loaded triangular arrays of N, = 19 traps, 

n= 100 (error bars show the quantum projection noise for ~100 
repetitions of the experiment). Solid lines are fits by damped sines of 
frequency y. The right panels depict the atomic positions. b, Collective 
oscillation frequency (2y/2 versus N (error bars, sometimes smaller than 
the symbol size, are s.d.; colour code for points as in a). The solid line is the 
expected /N enhancement. 


inset). The strong interactions between the Rydberg states induce 
correlations in the positions of the excitations (Fig. 1b), as we study 
experimentally below. 

Our set-up (Fig. 1c) has been described in refs 12 and 13. We trap 
cold (T~ 301K) single *’Rb atoms in optical traps with a 1 jum waist 
from a magneto-optical trap (MOT). Using a spatial light modulator 
(SLM), we create arbitrary, two-dimensional arrays containing 
1<N,<50 traps, separated by distances a >3|1m. The atomic 
fluorescence at 780 nm is imaged onto a camera. We observe, in the 
single-atom regime’”, that the level of fluorescence for each trap alter- 
nates randomly between two levels, corresponding to the presence of 
0 or 1 atom. The analysis of these N; fluorescence traces allows us to 
record, with a time resolution of 50 ms, the current number N of single 
atoms in the array. 

As soon as N exceeds a predefined threshold, we trigger the following 
experimental sequence. First, the loading of the array is stopped, and 
a fluorescence image is acquired to record the initial configuration of 
the atoms, that is, which traps are filled. A 6 G magnetic field is then 
switched on along the z axis and defines the quantization axis. After 
initializing all the atoms in |g) = |5S,/2, F=2, mp=2) by optical pump- 
ing, a two-photon Rydberg excitation pulse of duration 7 is shone onto 
the atoms; the Rabi frequency ({227 x 1 MHz) is uniform to within 
10% over the array. We then acquire a fluorescence image of the final 
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configuration by switching on the MOT beams. Atoms excited to |r) 
quickly escape the trapping region, and thus we observe only the atoms 
that were in |g) after excitation. The atoms that have been lost between 
the initial and final images are thus assigned to Rydberg states. This 
detection method has a high efficiency: it only gives a small number of 
‘false positives, as an atom also has a probability ¢ © (3 + 1)% of being 
lost, independently of its internal state (Methods). 

We first test our system in the conceptually simple situation of fully 
Rydberg-blockaded ensembles (that is, with at most one Rydberg 
excitation) containing up to N= 15 atoms. Figure 2a shows, for various 
arrays, the probability Po that all N atoms are in |g) at the end of the 
sequence. We observe high-contrast coherent oscillations, with a 
frequency enhanced by a factor -/N with respect to the single-atom 
case (Fig. 2b). This characteristic collective oscillation is the hallmark 
of Rydberg blockade’*"!’, where multiple excitations are inhibited 
within a blockaded volume (which, owing to the anisotropy, is close to 
an ellipsoid, with a major radius Rp defined by hQ = |C,(0)|/R§, anda 
small ‘flattening’ 31/°~ 1.2). This observation is a first step towards the 
creation of long-lived | W) entangled states (the symmetric combination 
of the N states with a single excitation) in the ground state’. 

The fully blockaded regime remains easy to describe theoretically as 
the blockade naturally truncates the size of the Hilbert space. In contrast, 
a more challenging regime corresponds to the Rydberg blockade being 
effective only between nearest neighbours, such that for long enough 
excitation times, the number of excitations becomes ~N/72. It is there- 
fore desirable to be able to vary the ratio «= R,/a of the blockade radius 
to the distance a between sites: for very small or large values of a, the 
dynamics is simple and the system can easily be compared to numerics, 
while, for intermediate values of a, the dynamics is challenging to 
calculate and experimental quantum simulation becomes relevant. Our 
set-up is particularly adapted to this goal, as we can vary easily both 
a (reconfiguring the SLM) and Ry (changing the principal quantum 
number n, we tune Cg which scales approximately as n''). 

This versatility is illustrated in Fig. 3, where we use a fully loaded 
ring-shaped array of N=8 traps, thus realizing a small spin chain with 
periodic boundary conditions (PBC). By varying both a and n, we 
tune the system all the way from independent atoms (a < 1), where 
each atom undergoes a Rabi oscillation at frequency 2, resulting in a 
Rydberg fraction fg (defined as the average number of Rydberg excita- 
tions divided by N) periodically reaching ~1 (Fig. 3a), to a fully block- 
aded array (a > 1, Fig. 3c) characterized by collective oscillations at 
frequency /N anda maximum fg = 1/N. In between (Fig. 3b, where 
a 1.5), the evolution of f(r) shows oscillations resulting from the 
beating of the incommensurate eigenfrequencies of the many-body 
Hamiltonian, equation (1). Our system allows us to detect the state 
of each atom, and thus to measure correlation functions. Figure 3d 
shows the dynamics of the Rydberg-Rydberg pair correlation 
function: 


1 (NiMi+k) 


OE TaN 


(2) 


The averaging over all traps does not wash out correlations despite 
the fact that the system is not fully invariant by translation (Methods). 
We observe a strong suppression of gk) for k=1 and k=7, that is, 
a clear signature of nearest-neighbour blockade. For some times (see 
for example, {27 = 3.1), we observe an antiferromagnetic-like staggered 
correlation function, while the average density is uniform (Methods). 

The solid lines in all panels of Fig. 3 are obtained by solving the 
Schrodinger equation governed by equation (1) using the inde- 
pendently measured experimental parameters, and then including 
the effects of the finite detection errors ¢ (Methods). One observes an 
overall agreement with the data, although some small discrepancies can 
clearly be noticed, especially at longer times. We attribute them to the 
Zeeman structure of Rydberg D states, which is not taken into account 
in our modelling by a spin-1/2: for 0+ 0, the van der Waals interaction 


© 2016 Macmillan Publishers Limited. All rights reserved 


gk) 


LETTER 


Figure 3 | Tuning interactions in an eight-spin 
chain with PBC. a, Independent atoms (Ry < a). 
Right, the Rydberg fraction fg oscillates between 
~0 and ~1 with the single-atom Rabi frequency 
§2. b, Strongly correlated regime (Ry ¥ 1.52). 

Right, the Rydberg fraction shows an oscillatory 


behaviour involving several frequencies. c, Fully 
blockaded regime: fg oscillates at JN 2 (right), 
and reaches a maximum of 1/N. In a-c, the left 


gk) 


gk) 


gk) 
ro) = ~ 


diagram shows the eight spin chain, and the 
shaded ellipsoids illustrate the (anisotropic) 
blockade volume. d, The Rydberg—Rydberg pair 
correlation function, for the parameters of b, is 
shown for increasing values of (27 (top to 
bottom). In all plots, the solid lines are obtained 
by numerically solving the time-dependent 
Schrédinger equation, and then including 
detection errors (¢ = 3%). Error bars (often 
smaller than symbol size) denote s.e.m. 


Qr=2.5 


Qr=3.1 
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couples |r) to other Zeeman states, leading to a slow increase in the 
number of excitations (Methods). 

We now study two systems containing a larger number of atoms. We 
first consider a one-dimensional spin chain with PBC comprising 
N,=30 traps and partially loaded with N= 20 + 1.5 atoms (Fig. 4a; we 
have checked that the 67% filling fraction does not change qualitatively 
the physics as compared to a perfect filling, see Methods). Its ‘racetrack’ 
shape was chosen to optimize homogeneity of the Rabi frequency over 
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the array. We chose parameters such that a + 4.3(1). The Rydberg 
fraction f,(7) shows initial oscillations before reaching a steady state 
(Fig. 4b) due to the dephasing of the many incommensurate eigenfre- 
quencies of equation (1) for this large value of N. The pair correlation 
function (shown in Fig. 4c for 272.0) is strongly suppressed for 
k <a, as expected from blockade physics, before oscillating towards 
the asymptotic value gk >> a) = 1 (refs 18, 19). A similar liquid-like 
correlation function has been observed in two dimensions”. The solid 


: Figure 4 | Ising dynamics in large spin 
ensembles. a, Racetrack-shaped array with 
N,= 30 traps, loaded with N= 20+ 1.5 atoms. 
The blockade radius Rj is about 4.3a (shaded 
ellipsoid). b, c, Properties of system shown in a. 
b, Time evolution of the Rydberg fraction fg. 

c, Rydberg pair correlation function gk) for 
27 = 2.0, showing a strong depletion for k < Rh, 


a b 1.0 c 20 
e oe e 
= ‘, 0.8 
e 
: 0.6 
e ot 
° 0.4 
e 
e 
e 0.2 
e e 
os 0.0 
am 0 2 4 
d e 
wt 


and contrasted oscillations around the asymptotic 
value 1 (the data are shown only for k > 0, as 
they are symmetric under the transformation 
k— —k). Error bars (most of the time smaller 
than symbol size) denote the s.e.m. Solid lines 
are the simulation results without any adjustable 
parameters. d, Square array of 7 x 7 traps loaded 
with N= 28 + 1.6 atoms. The blockade radius 

is about 2.6a. e, f, Properties of system shown 

in d. e, Evolution of fg. Error bars (most of the 
time smaller than symbol size) denote s.e.m. 
Solid lines are the simulation results without 
any adjustable parameters. f, Rydberg-Rydberg 
correlation function gk, 1) for Q7=5.3. 
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lines in Fig. 4b,c give the result of a full numerical simulation, without 
any adjustable parameters. Here the agreement with the spin- 1/2 model 
is excellent, as many atom pairs are aligned along the quantization axis, 
thus making the effects of the anisotropy small. We included the finite 
value of ¢, which has a strong effect on the pair correlations for k< a 
as (kK) increases from 0 to 2¢/fg (Methods). 

As a final setting, we use a N.=7 X 7 two-dimensional square array 
(Fig. 4d), loaded with N= 28 + 1.6 atoms (57% filling), for a = 2.6. 
The dynamics of fg now appears monotonous (Fig. 4e), without the 
initial oscillations seen above for smaller systems (Figs 3b and 4b). 
This suggests that with N = 30 atoms, the behaviour of the system is 
already close to the many-body behaviour observed in large ensem- 
bles”, with a fast initial rise of the Rydberg fraction, before it saturates. 
The simulation captures the initial rise of fg well, but does not repro- 
duce the slow increase observed at long times, which we attribute again 
to multilevel effects (that are indeed expected to be strong in this array 
where the internuclear axes of many pairs lie at a large 0). Figure 4f 
shows the two-dimensional Rydberg—Rydberg correlation function 


1 > (ni Nik jst) 3) 


(2) 
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N Fj (nij) Nitkj+l) 


where nj, refers to the site with coordinates (ia, ja). Although the 
system has open boundaries and thus does not show translational invar- 
iance, the averaging over the traps in equation (3) does not wash out 
correlations as Ry is small compared to the system size. We observe a 
depletion of the correlation function close to the origin due to blockade. 
The anisotropy of the interaction is visible, as the depletion region is 
elliptical, with a flattening close to the expected value 1.2. We observe, 
in the full time evolution of the correlation function (Methods), that 
blockade volumes become more densely packed with increasing time. 

The wide tunability of geometry and interactions demonstrated 
here opens the way to investigating the physics of spin systems with 
tens of particles. Our platform, especially when combined with quasi- 
deterministic loading of optical tweezers as demonstrated recently”””?, 
will be ideally suited for studying the transition from few- to many- 
body physics”4, thermalization in strongly interacting closed quantum 
systems”, or the dynamical emergence of entanglement following 
a quantum quench”. Using resonant dipole-dipole interactions 
between different Rydberg states’”, XY Hamiltonians with long-range 
couplings”® could also be implemented. Finally, exploiting the Zeeman 
structure of Rydberg states holds the promise of implementing more 
complex Hamiltonians, to explore for instance the physics of higher 
spins’, or to realize topological insulators*”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Loading of trap arrays. In the single-atom loading regime of optical microtraps, 
the probability of having a given trap filled with a single atom is p + 1/2. Therefore, 
when we monitor the number of loaded traps in view of triggering the experiment 
(Extended Data Fig. 1a), N fluctuates in time around a mean value N,/2, with 
fluctuations ~./N. 

When the number of traps is small, we can impose, as the triggering criterion, 
waiting until all traps are filled. The average triggering time Ty then increases expo- 
nentially with N, as can be seen in Extended Data Fig. 1b. We used this ‘full-loading 
mode for the data of Fig. 1 (1 <N<9) and Fig. 3 (N=8). This exponential scaling 
sets a practical limit of N~9 for fully loaded arrays. For N=9, the experimental 
duty cycle already exceeds one minute. 

Because of this, for larger N; we use partially-loaded arrays. We set the trigger- 
ing threshold in the tail of the binomial distribution of N, that is, close to 
N,/2 + ./Nt- This allows us to keep a fast repetition rate for the experiment, of the 
order of 1s~1, enabling fast data collection. Extended Data Fig. 1c shows the dis- 
tribution of loaded traps for the ‘racetrack array with N, = 30 (resp. the N.=7 x 7 
square array), where we set the triggering condition to N= 20 (resp. N= 30). Using 
this triggering procedure, we thus end up with a narrow distribution of atom num- 
bers N= 20+ 1.5 (resp. N= 28 + 1.6), corresponding to a filling fraction of 67% 
(resp. 57%), significantly above the average N,/2. These strongly subpoissonian 
distributions of atom numbers are such that the variation in N from experiment 
to experiment has a negligible effect on the physics studied in Fig. 4; moreover, as 
for each experiment the initial configuration image is saved, one can if needed 
post-select experiments where an exact number of atoms was involved (this is how 
the data in Fig. 2 for N> 10 were obtained). 

Recently, several experiments””** demonstrated quasi-deterministic loading of 

single atoms in optical tweezers, reaching p+ 90% using modified light-assisted 
collisions that lead to the loss of only one of the colliding atoms instead of both. 
A preliminary implementation of these ideas on our set-up gave p+ 80% for a 
single trap. In future work, by using such loading in combination with the real- 
time triggering based on the measured number of loaded traps, it seems realistic 
to reach, even in large arrays, filling fractions in excess of 0.9, that is, approaching 
those obtained in quantum gas microscope experiments using Mott insulators. 
Experimental parameters. Extended Data Table 1 summarizes the various values 
of the parameters of the arrays of traps and of the Rydberg states used for the data 
presented in the main text, and the resulting values of the dimensionless parameter a. 
It illustrates the wide tunability offered by the system. 
Finite detection errors. Our way to detect that a given atom has been excited toa 
Rydberg state relies on the fact that we do not detect fluorescence from the corre- 
sponding trap in the final configuration image. There is however a small probability 
€ of losing an atom during the sequence, even if it was in the ground state, thus 
incorrectly inferring its excitation to a Rydberg state'’. These ‘false positive’ detec- 
tion events affect the measured populations of the N-atom system. One can show 
that, if P, is the observed probability of having q Rydberg excitations, and P, the 
actual probability of having p Rydberg excitations: 


R=> oe jer — Na (4) 


p=0 


In principle, one can invert the above linear system relating the observed and actual 
probabilities*!, to correct the experimental data for the detection errors. Here we 
have chosen on the contrary to show the uncorrected populations, and to include 
detection errors on the theoretical curves instead. 

In order to determine the experimental value of ¢, we use the initial data points 
(7=0) of the data of Fig. 2. Since no Rydberg pulse is sent, we have Po=1, and from 
equation (4) the observed probability Po(7 =0) reads (1 — ¢)N. Extended Data 
Fig. 2a shows the variation of Po(0) as a function of N, together with a fit which 
allows us to extract ¢ = (3 + 1)%, the value we use for the theoretical curves in the 
main text (see below). 

Extended Data Fig. 2b shows the effect of this finite value of e on the probabil- 
ities Po, P; and P) in the full blockade regime, for atom numbers N=3, 9 and 15, 
clearly illustrating that the ‘false positive’ detection events: first, yield non-zero (and 
increasing with N) double excitation probabilities (that oscillate in phase with P,); 
second, multiply the amplitude of Po by a factor (1 — <)§; and third, reduce the con- 
trast of the P, oscillations. Globally, the experimental data (see Extended Data Fig. 
3) show these features, superimposed with other imperfections such as damping, 
not related to the finite value of e. 

Finally, we mention the effect of the detection errors on the correlation func- 
tions. In the fully blockaded region k < a, one ideally expects a vanishing g” for 
€ =0. However, even for a small ¢, this value is increased substantially (see, for 
example, Fig. 4c) to 2¢/f where fr is the Rydberg fraction. Indeed, ROUk= 1) is 
given by an average of quantities of the form (njn;+1)/((nj) (ni+1)). For ¢=0, the 
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numerator vanishes due to blockade; the only possibility of having a non-zero value 
comes from detection errors. To lowest order in ¢, the probability of getting a non- 
zero value for njnj;,; is that either atom iis in |r) (probability fg) and atom i+ 1 is 
lost (probability <), or vice versa. This results in a value of 2efg for the numerator, 
while for the denominator we can use the zeroth-order values (n;) =(ni+1) =fr, 
thus giving g°(1) © 2¢/fp, which experimentally can be as large as 0.5. 
Additional experimental data. Full Rydberg blockade. Extended Data Fig. 3 shows 
additional data in the full blockade regime (Fig. 2). In Extended Data Fig. 3a, 
the arrays of 1 to 9 traps are fully loaded, while in Extended Data Fig. 3b, the 
19-trap triangular array is partially loaded with 10 to 15 atoms. In both panels, 
the left column shows the time evolution of the probability Po of recapturing all 
atoms at the end of the sequence, the middle column shows P), and the right 
column shows P. The points in Fig. 2a corresponding to N=8 and N=9 in 
partially loaded arrays were taken in a similar configuration as for N= 10 to 15, 
but the array contained only N= 17 traps. The curves (not shown here) do not 
show any noticeable difference with other sets of data. We draw attention to the 
following. 

First, we recognize the effects of the finite detection errors ¢ + 0 on the amplitude 
and contrast of the collective oscillations discussed above. 

Second, the oscillations exhibit some damping, which seems to increase with N. 
To quantify this, we fit the data by the function 


P(r) =ae~" (cos?(QnT/2) +b) +c (5) 


where a, b, c, y and {2y are adjustable parameters (solid lines). This functional 
form was chosen to account in a simple way for the asymmetry in the damping. 
Extended Data Fig. 3c shows the damping rates ¥, extracted from the probabilities 
Po as a function of N. We observe an initial increase in the damping rates, which 
then saturates above N=5. An increase with N of the damping rate was observed in 
other similar blockade experiments'*"®. However, even for large number of atoms, 
the damping rates are small enough that the coherent dynamics dominates over 
the relevant experimental timescales. We therefore emphasize that in all the other 
figures of the paper (main text and methods), the theory curves are obtained by 
disregarding completely any damping, that is, by solving the Schrodinger equation, 
not a master equation. 

Third, we observe that P, slowly increases over time for some specific values 
of N (see in particular N= 4, 6, 9, 13), corresponding to particular geometries. 

We do not have a full understanding of these last two observations, but they 
may originate from the breaking of the blockade due to the Zeeman structure of 
the Rydberg states D3/2 (see discussion below). 
Eight-atom ring. Extended Data Fig. 4 shows that, within statistical fluctuations, the 
density of excitations on the eight-atom ring is homogeneous (this remains true 
at all times), and that the antiferromagnetic-like or crystal-like features obtained 
for some times, for example, for (27 = 3.1, can only be observed in the correlation 
functions. This illustrates the interest of our set-up, in which spin chains with 
PBC can be realized easily. On the contrary, in a one-dimensional chain with open 
boundary conditions, ‘pinning’ of the excitations at specific sites would occur due 
to edge effects. 
Racetrack-shaped array. Extended Data Fig. 5a shows the full evolution of the time 
correlation function for the data of Fig. 4a—c (Rp = 4.3a). Extended Data Fig. 5b 
corresponds to the same settings except for the fact that one now has Ry = 2.4a. 
Square array of 7 x 7 traps. Extended Data Fig. 6 shows the full time evolution of 
the two-dimensional Rydberg—Rydberg correlation function g)(k, I) for the 7 x 7 
square lattice of Fig. 4d-f. Note that the two-dimensional pair correlation function 
is calculated using equation (3), which implies that, due to the finite size of the 
array, the number of terms included in the sum decreases when k, | increase. The 
normalization takes this variation into account. 
Mapping equation (1) on the quantum Ising Hamiltonian. To show the link 
between Hamiltonian, equation (1), reproduced below, 


hQ |; i 

H=)S> —o,+ > Vjn'ni (6) 
: 2 i<j 

and the quantum Ising model, we simply rewrite the operators n' as (1+ 01) /2. 

Omitting a constant term, we get: 


= ol sD Bel +L Holo! (7) 
i i i<j 

This is the quantum Ising model, with a transverse field proportional to 2, and a 

local longitudinal field Bj = )>, Vj/2 arising from the interactions. In a system with 

open boundary conditions, this collective longitudinal field is inhomogeneous, 

and can have observable effects*”. In an infinite lattice, or in a system with periodic 

boundary conditions as realized here in one dimension, the longitudinal field is 
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homogeneous and could be compensated for by applying a global detuning of the 
excitation laser. 

Anisotropy of the interaction. For a pair of atoms in a nD3,. Rydberg state with 
the internuclear axis not aligned with the quantization axis, the rigorous descrip- 
tion of the van der Waals interaction requires the inclusion of all various Zeeman 
sublevels; the interaction then takes the form of a 16 x 16 matrix. To keep the 
description of a system of N atoms tractable, one can, in the blockade regime, 
define an effective, anisotropic van der Waals potential’ reducing the previous 
matrix to a single scalar. For nD3, states, the anisotropy reported in refs 7 and 11 
is well reproduced by the simple expression 


Co(0) 


Verr(r, 0) = [5 + Pag 6 (8) 
3 3 

with @ the angle between the quantization axis and the internuclear axis, giving a 

reduction by a factor of three in interaction strength when 6 goes from 0 to 1/2 

(see inset of Fig. 1a). 

Owing to the anisotropy in equation (8), the shape of the blockade volume 
centred on a Rydberg atom is also anisotropic. However, because of the r°-scaling 
of the interaction, the surface r(0) defined by Veg(r, 0) = 2 is quite well approx- 
imated by a prolate spheroid with an aspect ratio of 3\/°~ 1.2. In the Figs 1, 3, 4, 
the shaded regions oe the blockade volume have the polar equation 

1/6 
r(0)= Ro(+ + =cos‘¢) . 
Numerical simulation of the dynamics. Our theoretical description of the system 
is based on the mapping of its dynamics into a pseudo-spin 1/2 model with aniso- 
tropic long range interactions. We therefore neglect the rich Zeeman structure of 
the nD3,2 states. The numerical calculations rest on the solution of the Schrodinger 
equation for the Hamiltonian of equation (1) of the main text in a reduced Hilbert 
space H. We first write the wavefunction |~)) of the system with N atoms in terms 
of states with a fixed number of Rydberg excitations and ground state atoms, which 
correspond to the eigenstates of the Hamiltonian with vanishing Rabi frequency (2 
(refs 17, 33). Then the truncation procedure is based on two complementary steps: 
first we define the maximum number of Rydberg excitations NP that we include 
in our basis, then second we eliminate those states which display excitations closer 
than a fixed distance Ro. Both N7 and Rp are adjusted to ensure the convergence 
of the dynamics. For small samples (Fig. 3) we performed simulations including 


all 256 basis states, whereas for the racetrack configurations we typically set Ro 


math__ 


smaller than the lattice constant but include up to N;"" '=10 excitations at most, 


NP (20 
reducing the dimension of H from 2”° ~ 10° to X4q~0 q 6 x 10° For the7 x7 


square array with 30 atoms, we set Ro = 1.3a (much smaller than the blockade 
radius Rp = 2.6a), thus reducing the dimension of 1 to ~3 x 10° (the full Hilbert 
space is of dimension 2° ~ 10°, and using only the truncation criterion on the 
number of excitations would reduce it to about 5 x 107, still intractably large). The 
Schrédinger equation within the truncated Hilbert space is then solved with a 
standard split-step method for the two non-commuting parts of the Hamiltonian 
of equation (1). All these calculations were repeated for several realizations of the 
loading of the arrays (50 realizations for the squared 7 x 7 configurations and 200 
realizations for the case with fewer traps), taking into account the anisotropic 
interparticle interaction of equation (8). The comparison with experimental data 


of the average fraction of excitations f, = ee qP,/N is done by including the 


“false positive” detection events as described by equation (4). 

The calculation of the g®(k) correlation function in Figs 3d and 4c follows the 
definition of equation (2). However, in contrast with the calculation of the average 
fraction of the excitations it is not possible to derive an analytical formula for g(k) 
to properly take into account the detection efficiency of Rydberg excitations (unless 
k < as described in section ‘Finite detection errors’). Therefore we implement a 
standard Monte Carlo algorithm to perform the average of the correlation func- 
tion over randomly generated configurations which are weighted in g°)(k) with 
the initial (quantum) probability extracted from the real time dynamics of the 
Schrédinger equation. For example, the state |r; r)) which contains N,= 2 Rydberg 
excitations and amplitude c; (t) can wrongly be detected as the state |r; r; 7.) with 
probability P=e(1 — ¢)\*. If the latter state is generated from our sampling algo- 
rithm then its weight in the correlation function corresponds to |¢; (t)|?. Finally we 
average over several hundred randomly generated configurations to obtain well 
converged results for the correlation function. 

Effect of partial loading of large arrays on the observed dynamics. Using the sim- 
ulations described above, we explore to what extent the partial loading of our larger 
arrays may change the observed dynamics as compared to the ideal case of full loading. 

Extended Data Fig. 7 shows, for the ‘racetrack array of Fig. 4a—c, the results 
of simulations for the experimentally relevant case of partial loading (solid lines, 
filling fraction 1+ 0.67) and for the ideal, full loading case (thin dashed lines): 


First, Extended Data Fig. 7a also shows the time evolution of the Rydberg 
fraction fp. The dynamics is qualitatively similar in the two situations, with initial 
oscillations that rapidly get damped owing to the dephasing of the many incom- 
mensurate eigenenergies of the Hamiltonian. Quantitatively, the initial oscillations 
are faster in the fully loaded case: this is expected, as each blockade volume contains 
1/7 as many atoms, and thus, due to the scaling of the collective Rabi frequency 
with the number of atoms in a blockade volume, we expect an enhancement of 
the oscillation frequency by ~7) 7 ~ 1.2, close to what we observe. In the same 
way, the asymptotic Rydberg fraction when T — oo is reduced by a factor close to 
the expected factor 7. 

Second, Extended Data Fig. 7b shows the pair correlation function g(k) for 
927 = 2.0. Here again, the changes are moderate, although the oscillations of the 
correlation function for k > a would be slightly more contrasted for the fully loaded 
array. 

Simulations for the other large array settings give similar results, allowing us 
to safely conclude that the partial loading of our largest arrays does not affect 
significantly the observed dynamics. This conclusion would be different for other 
types of experiments, for instance the transport of a spin excitation in the case of 
resonant-dipole-dipole interactions. 

Approximate translational invariance. For the one-dimensional configurations 
of the main text (eight-atom ring of Fig. 3b and racetrack-shaped array of 30 traps 
of Fig. 4a) we plot the spatially averaged pair correlation function 


OQ =) yy Amtieed (9) 
MF (ni) (ni+k) 

where the subscripts label sites. For a system invariant by translation, all terms 
in the sum are identical, and the averaging over i simply improves the signal to 
noise ratio. However, our systems are not translationally invariant, in particular 
because of the anisotropy of the interaction, and a natural question to address is 
whether the averaging reduces the contrast of the correlation functions. To answer 
this question, we have calculated the dynamics of the pair correlation function 
for the eight-atom ring, taking or not taking into account the anisotropy of the 
interaction (Extended Data Fig. 8). We observe that the contrast reduction due 
to averaging is very small, thereby validating our choice to perform it for the data 
shown in the main text. 
Effective loss mechanism arising from anisotropic interactions of D states. The 
agreement between our measurements and the results of the simulations is not 
perfect for the largest excitation times, in particular for some settings (for example, 
for some configurations in the full blockade regime, for the eight-atom ring in the 
partial blockade regime, and for the 7 x 7 square array), where we observe a gradual 
increase in the number of measured Rydberg excitations. 

These effects could be qualitatively reproduced if the detection errors ¢ would 
increase in time. However, the main reason for these losses is that the microtraps 
are switched off during the excitation (to avoid inhomogeneous light-shifts), and 
as they are off for a fixed amount of time (31s), independent of 7, we do not, at first 
sight, expect € to increase in time. One could imagine however that the presence of 
the Rydberg excitation lasers may induce extra loss (due to off-resonant scattering 
for instance), and in this case one would end up having an ¢ increasing with r. We 
have experimentally ruled out this possibility by measuring the recapture proba- 
bility when shining the Rydberg excitation lasers, detuned from the Rydberg line 
by ~100 MHz, for the full 3 j1s, without measuring any detrimental effect. 

A second possible reason would be the motion of the atoms. Owing to their finite 
temperature, the atoms move during free flight with a velocity v~50nm js”! Now, 


strictly speaking, the terms corresponding to the laser coupling in equation (1) 
of the main text are not No, but Del rg +h.c., where k is the sum of the 


wavevectors of the excitation lasers at 795 and 475 nm, and r((t) the position of 
atom i. Thus, because of the motion, the phase factors of the couplings become 
time-dependent, which for example, yields a dephasing of the spin wave corre- 
sponding to | W) states. However, a numerical simulation of this effect shows that 
the induced dephasing rates are negligible for our parameters. 

We thus believe that the cause of the observed extra losses lies in the large 
number of interacting Zeeman sublevels when two atoms are excited to nD3/2 
states: for 90, all 16 pair state Zeeman sublevels are coupled together by the van 
der Waals interaction. For a large number of atoms, this may lead to an effective 
loss rate from the targeted |r) states into a quasi-continuum comprising all other 
(weakly interacting) Zeeman states, and hence to a gradual increase of popula- 
tion of the Rydberg manifold. Qualitatively, this interpretation is corroborated 
by the fact that the observed increase in the number of excitations depends quite 
strongly on the array geometry: for instance, the data from the racetrack-shaped 
array (Fig. 4b), for which a majority of interacting atom pairs are almost aligned 
along the quantization axis z, are well reproduced by the simulations even at 
long times, unlike in the case of the eight-atom ring or the 7 x 7 square array 
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(Figs 3b and 4e), for which many interacting pairs have their internuclear axes 
strongly inclined with respect to z. 

A full simulation of the dynamics of N atoms including the full Zeeman struc- 
ture of nD3,) states is challenging (neglecting hyperfine structure, the dimension 
of the Hilbert space is 5, as we have 5 states per atom: the ground state and the 
four Zeeman sublevels of the Rydberg state), and is beyond the scope of this paper. 
However, in order to test the hypothesis described above, we have simulated the 
following minimal ‘toy model’ displaying the effect of a Zeeman structure: we con- 
sider a system of N=6 atoms ina line, aligned either along the quantization axis 
z(0=0) or along y (= 1/2), and excited to the Rydberg state nPj/2. The Hilbert 
space is then of size 3°, and we can perform exact diagonalization. The van der 
Waals Hamiltonian between two atoms (a 4 x 4 matrix which depends on 6) was 
taken from equations (4) and (5) of ref. 7. 

The red solid lines in Extended Data Fig. 9 show the time evolution of the 
fraction of Rydberg atoms (whatever the Zeeman state) fg, calculated using this 
Hamiltonian with the full Zeeman structure. The black dotted lines show fx(t) 
when modelling the system by an assembly of spin-1/2 particles, with an interac- 
tion given by the effective potential analogous to equation (8) but calculated for 
nP)2 states following ref. 7 (for the parameters of the figure, the effective potential 
is four times as high when 6= 1/2 as when 0=0). As expected, for 6=0, the 
two approaches give exactly the same results (Extended Data Fig. 9a). However, 
for 0= 1/2, we observe that the two simulations, while giving similar results at 
short times, disagree significantly at larger times, with the nP/2 structure yielding 
(as observed in some of our experiments, see for example, Fig. 4e) an increased 
Rydberg fraction, as the population of the Zeeman state not directly coupled to 
|g) by the laser (blue dashed line) slowly increases. On the basis of this simulation, 
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we can thus conclude that in some cases, the anisotropic character of the interaction 
and the complex Zeeman structure can lead to the observed discrepancy between 
our data and the modelling by spin-1/2 particles interacting via an effective 
potential. 

We note however that this behaviour is not universal and that, for a given geometry 
and a given Rabi frequency, varying the value of n, as well as that of the magnetic 
field, affects the degree of agreement between the exact dynamics and the 
simplified spin-1/2 model. In Extended Data Fig. 9, for instance, we chose n= 30, 
B=0.2G, anda spacing between the atoms of a=2 or 1.6m, to display a behav- 
iour qualitatively similar to that of Figs 3b and 4e. This non-universal character 
suggests that for nD3,2 states also, a careful choice of parameters (n, spacing, 
B field) may allow a very good agreement between the observed dynamics and that 
ofa spin-1/2 model with anisotropic interactions even in geometries where many 
pairs lie at a large angle from the quantization axis. A comprehensive study of those 
conditions, along the lines of refs 7, 8, 34, is very important in view of quantum 
simulation applications, but it is however beyond the scope of this paper and will 
be the subject of future work. 
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Extended Data Figure 1 | Full and partial loading of arrays. a, Sketch can vary substantially depending on the density of the magneto-optical 
of the experimental sequences. During loading, the camera images are trap used to load the array, and the data points shown here correspond 
analysed continuously to extract the number of loaded traps. As soon to typical conditions used for the data of the main text. Error bars, s.e.m. 
as a triggering criterion is met, the loading is stopped and an image of c, Probability py of having a number N of loaded traps in the partially 
the initial configuration is acquired. After Rydberg excitation, a final loaded regime for the 30-trap ‘racetrack’ (left) and the 49-trap square array 
image is acquired, revealing the atoms excited to Rydberg states (green (right; blue dots). The shaded distributions correspond to what would 
disks at bottom right). b, Average triggering time Ty when the triggering be observed with random triggering. For this partial loading regime, the 
criterion is set to N= N;: achieving full loading requires an exponentially triggering rate of the experiment is about 1s. 


long time, limiting the method in practice to N, < 9. The triggering times 
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Extended Data Figure 2 | Effect of detection errors. a, Experimental € = 3% (the shaded area corresponds to 2% < ¢ < 4%). Error bars, s.e.m. 
determination of ¢. From the data of the full blockade experiments (Fig. 2 b, Calculated probabilities of observing 0,1 or 2 excitations (columns 1-3) 
of main text), we plot the probability Po of recapturing all N atoms for as a function of the excitation pulse area (2t, assuming a perfect blockade 
7 =0. The solid line is a fit to the expected dependence (1 — e)N, giving and ¢ = 3%, for atom numbers N= 3, 9, 15 (rows 1-3). 
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Extended Data Figure 3 | Full data set for the Rydberg blockade data. the column on the right the probability P, of losing two atoms out of N. 
a, Fully loaded arrays of 1 to 9 traps (n= 82). b, Partially loaded array The solid lines are fits by equation (5). Error bars, s.e.m. c, Damping rate 
of N,= 19 traps, containing from N= 10 to N= 15 atoms (n= 100). The extracted from the Po data as a function of the number of atoms in the 
column on the left shows the probability Po of recapturing all atoms, the array. Error bars, s.d. 


centre column the probability P; of losing just one atom out of N, and 
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Extended Data Figure 4 | Homogeneous excitation in the eight-atom ring. a, For 27 = 3.1, we observe strongly contrasted oscillations in the pair 


correlation function g*(k). b, The average density of Rydberg excitations, however, is approximately the same on every site. The horizontal dashed line 
indicates the mean over all sites. Error bars, s.e.m. 
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Extended Data Figure 5 | Full time evolution of the correlation 


functions for the 30-trap, racetrack-shaped chain. a, Same as for Fig. 3a-c. 


The right panel is the time evolution of the pair correlation function, 
clearly showing that, for times longer than a few 2’, the pair correlation 
function does not evolve significantly. The vertical dashed line indicates 
the value of the blockade radius. b, The principal quantum number is now 
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n=57, and the Rabi frequency §2= 27 x 1.7 MHz, such that R, =2.4a. The 
central panel shows the time evolution of the Rydberg fraction, and the 
right panel the time evolution of the pair correlation function. For both 
aandb, fg approaches, at long times, the close-packing limit a/R, of hard 
rods of length Ry (dashed horizontal lines). 
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Extended Data Figure 6 | Full time evolution of the experimental correlation function for the 7 x 7 square array. One observes the blockaded region 
around (k, 1) = (0, 0), with a slight flattening reflecting the anisotropy of the interaction. After a few 2~', the correlation function g)(k, 1) does not evolve 
any more. 
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Extended Data Figure 7 | Full versus partial loading for the dynamics and correlations in the case of Fig. 4a-c. a, Rydberg fraction as a function of 
time for the partially loaded (solid line) or fully loaded (thin dashed line) 30-trap array. b, Pair correlation function g(k) for 27 ~ 2.0, for the partially 
loaded (solid line) or fully loaded (thin dashed line) 30-trap array. In both cases, the effect of detection errors (¢ = 3%) is included. 
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Extended Data Figure 8 | Assessing the validity of the approximation b, Simulation with the same parameters as in a, except that the angular 
of translational invariance in the eight-atom ring. Calculated pair dependence is neglected (we replace equation (8) by its value for 0=0), 
correlation function gk) as a function of the excitation time for thus re-establishing translational invariance. We observe that the contrast 
the eight-atom ring. a, Simulation using the experimentally relevant in a is reduced, as expected, but only in a marginal way. 


anisotropic interaction, which breaks translational invariance. 
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Extended Data Figure 9 | Effect of the Zeeman structure of the Rydberg _ perpendicularly to z, with a =2 1m (inset; thus keeping the same effective 


states on the dynamics of the Rydberg fraction fr. We use the toy potential interaction between adjacent atoms as in a). The full (red solid 
model of nP 1/2 Rydberg states discussed in Methods, with n = 30 and line) and approximate (black dotted line) solutions agree at short times, 
B=0.2 G.a, The atoms are aligned along the quantization axis z, and but for longer times some population builds up in the other Zeeman 
spaced by a= 1.6m (inset). In this case, the full solution including the sublevel that is not directly coupled to |g) by the laser (blue dashed line), 
Zeeman structure (red solid line) agrees perfectly with the solution of the resulting in a slowly increasing excess of Rydberg fraction similar to what 
effective spin-1/2 model with an anisotropic effective potential (as used is observed experimentally for some configurations. 


in all the rest of the paper, black dotted line). b, The atoms are aligned 
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Extended Data Table 1 | Experimental parameters used for the data presented in the main text 


Trap array parameters Rydberg state parameters 
Figure Spacinga NN; N n Calculated Cg/h Q/(27) Ry a 
(yum) (GHz jum®) (MHz) (yum) 
2a (full) 3.0 1-9 NM 82 —8.9 x 103 1.5 14 4.5 
2a (partial) 3.2 19 10-15 100 —8.0 x 104 1 20 «64 
3a 6.3 8 8 54 —6.7 1.6 4.0 0.63 
3b 6.3 8 8 61 —7.6 x 10? 13 9.1 1.4 
3c 3.8 8 8 100 —8.0 x 104 0.95 21 5.5 
4a,b,c 3.1 30 2041.5 79 —6.0 x 108 1.0 13.5 43 
Ad,e.f 3:5 49 28+16 61 —7.6 x 10? 1.4 9.1 2.6 


Wide tuning of a=R,/a, over one order of magnitude, is achieved by a combination of changes in a and n (while 2 is kept almost constant). 
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Synthetic Landau levels for photons 


Nathan Schinel!, Albert Ryou', Andrey Gromov?, Ariel Sommer! & Jonathan Simon 


Synthetic photonic materials are an emerging platform for 
exploring the interface between microscopic quantum dynamics 
and macroscopic material properties'>. Photons experiencing 
a Lorentz force develop handedness, providing opportunities to 
study quantum Hall physics and topological quantum science®®. 
Here we present an experimental realization of a magnetic field for 
continuum photons. We trap optical photons in a multimode ring 
resonator to make a two-dimensional gas of massive bosons, and 
then employ a non-planar geometry to induce an image rotation 
on each round-trip’. This results in photonic Coriolis/Lorentz and 
centrifugal forces and so realizes the Fock-Darwin Hamiltonian 
for photons in a magnetic field and harmonic trap!°. Using 
spatial- and energy-resolved spectroscopy, we track the resulting 
photonic eigenstates as radial trapping is reduced, finally observing 
a photonic Landau level at degeneracy. To circumvent the challenge 
of trap instability at the centrifugal limit!®!', we constrain the 
photons to move on a cone. Spectroscopic probes demonstrate flat 
space (zero curvature) away from the cone tip. At the cone tip, we 
observe that spatial curvature increases the local density of states, 
and we measure fractional state number excess consistent with the 
Wen-Zee theory, providing an experimental test of this theory 
of electrons in both a magnetic field and curved space'”"!°. This 
work opens the door to exploration of the interplay of geometry 
and topology, and in conjunction with Rydberg electromagnetically 
induced transparency, enables studies of photonic fractional 
quantum Hall fluids'*!” and direct detection of anyons'®””. 

The Lorentz force on a charged particle moving in a magnetic field 
leads to the unique topological features of quantum Hall systems, 
including precisely quantized Hall conductance, topologically pro- 
tected edge transport, and, in the presence of interactions, the predicted 
anyonic and non-Abelian braiding statistics that form the basis of topo- 
logical quantum computing”®. To controllably explore the emergence of 
these phenomena, efforts have recently focused on realizing synthetic 
materials in artificial magnetic fields, and in particular, upon imple- 
mentations for cold atoms and photons. Successful photonic imple- 
mentations have employed lattices with engineered tunnelling®?!~*. 
However, it is desirable to realize artificial magnetic fields in the simpler 
case of a continuum (lattice-free) material”?>°, where strong inter- 
actions are more easily accessible and the theory maps more directly 
to fractional quantum Hall systems. In this work, we develop a new 
approach and demonstrate the first continuum synthetic magnetic field 
for light. 

To achieve photonic Landau levels we harness the powerful analogy 
between photons in a near-degenerate multimode cavity and massive, 
trapped 2D particles*”*. Owing to mirror curvature, the transverse 
dynamics of a running wave resonator are equivalent to those of a 2D 
quantum harmonic oscillator (Fig. 1a). Non-planar reflections cause 
the transverse properties of the light field—for example, field profile 
(image) and polarization vectors—to rotate by an angle @ upon a round 
trip (Fig. 1b). Polarization rotation splits the energy of circularly polar- 
ized eigenmodes, while image rotation, in analogy to a rotating frame, 
introduces Coriolis and centrifugal forces. As the anti-confinement 
from the rotation compensates the confinement from the mirror 


1 


curvature, we are left primarily with a Coriolis force, or equivalently, a 
Lorentz force. When dynamics are coarse-grained over many round- 


trips, we arrive at the Fock-Darwin Hamiltonian (see Supplementary 
1 (qB)"™ 4 


Information) Hpp = =— (o _ 3 


x r) + Wrap? , where mis the 
dynamical particle mass, p is the particle's transverse momentum vector, 
r is the particle's transverse position vector, Z is the longitudinal unit 
vector, and Wrap/2T is the (residual) harmonic trapping frequency. The 
28 (qB)"" _ an 

i ME, 
where L, and @ are the on-axis resonator length and opening half-angle 
(Fig. 1c), and \ is the wavelength of light. When the resonator length 
is tuned to eliminate residual harmonic trapping, only a Lorentz force 
remains, and the Hamiltonian describes massive particles in Landau 


synthetic magnetic field is or 6 for small angles 0, 


levels, where the nth Landau level has energy hu(n + >), with w, being 


the cyclotron frequency, and consists of states with angular momentum 
J=—n, —n +1,... in units of the angular momentum quantum fi. The 
synthetic magnetic field is then equivalently given by (qB)'¥"/h = 4/w¢, 
that is, one flux quantum per area TW /4, where wg is the resonator /=0 


mode waist (1/e? intensity radius). The magnetic length Js may there- 
fore be identified as wo/2. 

Although Landau levels exhibit ‘topological protectior’ against local- 
ized disorder, long-range potentials may guide the particles to infinity, 
inducing loss!!°. In our system, the dominant source of long-range 
disorder is trap asymmetry (astigmatism) that arises from mirror 
imperfections and off-axis reflection and drives AJ = +2 transitions 
(see Supplementary Information). We circumvent this by imposing 
an additional discrete three-fold rotational symmetry on our Landau 
levels. To achieve this, we carefully balance transverse and longitudinal 
energy scales such that only every third angular momentum state is 
degenerate (see Supplementary Information). 

The three-fold symmetry of the Landau levels induces a conical 
geometry on the 2D space for transverse photon dynamics. To see this, 
consider a particle which leaves the edge of a particular 120° wedge of 
the plane; the discrete rotational symmetry requires it to appear on 
the other side, which is equivalent to wrapping this wedge into a cone 
(Fig. 1d). Working away from the apex of the cone gives access to flat 
space Landau levels with every angular momentum state accessible, 
while working near the apex allows experimental investigation of par- 
ticle dynamics near a singularity of spatial curvature. 

Our experimental resonator consists of four mirrors with nominal 
radii of curvature R= (2.5, 5, 5, 2.5) cm arranged as shown in Fig. Ic, 
and has an ]=0 mode finesse of 2.0 x 104. The on-axis length 
L,=1.816cm and the opening half-angle 6 = 16° were chosen to create 
a photonic Landau level while minimizing residual astigmatism. 
Varying the resonator length by ~20 1m adjusts the splitting between 
states by ~1 MHz (see Supplementary Information). Tuning this split- 
ting to zero results in a free spectral range at degeneracy of 
Vesr = 3.8209(2) GHz. The resonator has an ]=0 waist size wp = 43 pm 
and a cyclotron frequency w,.=2n x 2.1671(2) GHz, which together 


yield a photon dynamical mass of m ayn = ie = 1.84 x 107°m,, where 
WeWo 


m- is the electron mass. 
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Figure 1 | Resonator structure and transverse manifold geometry. 

a, Top, ray trajectories (black lines) in a curved mirror resonator oscillate 
transversely (green arrows). In a particular transverse plane, the 
stroboscopic time evolution of the ray positions samples a harmonic 
oscillator trajectory (blue points). In paraxial optics, the solutions for the 
transverse modes are Hermite-Gauss profiles (red curve). The transverse 
degrees of freedom of a resonator are precisely those of a 2D quantum 
harmonic oscillator (bottom). b, Top, as a four mirror resonator is made 
non-planar (purple arrows), the light rays are induced to rotate (blue 
arrow) about the optic axis. In the transverse plane (represented below), 
this corresponds to flattening the 2D harmonic potential (centrifugal 
force) and the introduction of an effective magnetic field (Coriolis force). 


In practice, we tune our resonator to degeneracy by varying its 
length, which primarily changes the harmonic trapping without chang- 
ing the effective magnetic field, and we track the energy spectrum and 
spatial profiles of resonator modes by observing the transmission of 
circularly polarized light with a holographically programmed spatial 
profile (Fig. 2, see Supplementary Information). Figure 2a shows the 
evolution of a number of mode energies in numerous Landau levels 
as we adjust the resonator length over almost a centimetre. Using the 
observed mode-profiles (shown as insets), we identify the four lowest 
modes in the figure as those comprising the lowest conical Landau 
level, and centre the graph on their approximate degeneracy point. 
Figure 2b shows high-resolution spectroscopy of a larger number of 
modes in the lowest Landau level near the length where the harmonic 
confinement is cancelled. We precisely extract the change in resonator 
length from the spectroscopically measured free spectral range and 
compensate the residual harmonic trapping to zero. At this point, 
the residual non-degeneracy comes from local disorder, which causes 
an observed level repulsion for high angular momentum states (Fig. 2b, 
main panel) that is not observed at lower angular momentum 
(Fig. 2b, top inset) as well as a significant reduction in mode lifetime 
(Fig. 2c). Away from degeneracy the modes are nearly ideal rings with 
2x x I phase winding (experimentally determined by varying the phase 
profile of the injected light, see Supplementary Information); at degen- 
eracy these modes mix due to local disorder potentials (Fig. 2d). This 
effect is apparent because of the long particle lifetime (high finesse of 
our resonator) and, in only causing mode distortion, is qualitatively 
different from global potentials such as astigmatism that cause mode 
deconfinement (see Supplementary Information). The local disorder 
merely creates chiral, localized states; it does not break topological 
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c, Our non-planar resonator consists of four mirrors (blue and purple) in a 
stretched tetrahedral configuration of on-axis length L, and opening half- 
angle 0. The image rotates about the optic axis (red) on every round trip. 
d, Left, we depict the transverse plane at the resonator waist pierced by a 
uniform perpendicular (along Z) magnetic field B of magnitude B, and 
show a generic profile (red curve) with three-fold symmetry. When the 
plane is cut arbitrarily into three equal sections, the entire profile is fully 
determined within any one-third section of the plane: when a trajectory 
leaves one side of a section, it reappears on the other side. Each section 
may be wrapped into a cone on which the original profile appears once 
(right; this would be true for any discrete rotational symmetry). The 
effective magnetic field is everywhere perpendicular to the cone’s surface. 


protection so long as it only mixes modes within a single Landau level 
and, in an interacting system, is weaker than the interactions. This 
insensitivity to weak disorder is a notable advantage of our set-up as 
compared to, for example, injecting angular momentum modes into a 
two mirror resonator!” (see Supplementary Information). 

To demonstrate our system’s stability out to large displacements from 
the cone tip, Fig. 3a, b shows large-angular-momentum orbits. Figure 3a 
presents a large displaced state composed of modes with angular 
momentum up to !~ 60, which exhibits three-fold symmetry and 
interferes with itself, producing a strongly fringed pattern due to the 
rapid phase winding of each ring. Figure 3b unwraps another large- 
angular-momentum mode showing that if an orbit encircles the cone 
tip, then it must do so three times, as a consequence of the three-fold 
symmetry. 

Remarkably, photons in our resonator may live on three 
distinct cones, distinguished by additional magnetic flux threaded 
through their tips. To understand this, note that the planar lowest 
Landau level may be spanned by angular momentum states 


vi(z= 2) x z'exp(— \z|*) for /=0, 1, 2, ..., with the transverse posi- 
0 


tion vector r= (x, yy In our resonator these are partitioned into three 
separately degenerate sets corresponding to lowest Landau levels on 
different cones. These sets are the /=0, 3, 6, ... modes, the /=1, 4, 7, ... 
modes, and the /=2, 5, 8, ... modes and satisfy the angular symmetry 
condition o(0 + 27/3) = e?™34)(0), where c=0, 1, or 2 is the lowest 
angular momentum state in the set and serves as the cone’s label. c=0 
defines the symmetry relation that describes an unthreaded cone; with 
c 0, the cone has an additional Aharanov-Bohm phase arising from 
c/3 magnetic flux quanta threaded through its tip (Fig. 3c). Angular 
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Figure 2 | Building a Landau level. The modes of our resonator follow the 
Fock-Darwin Hamiltonian of a massive, harmonically trapped particle in 
magnetic field: the magnetic field creates a ladder of Landau levels 
uniformly spaced by the cyclotron frequency, uw, while the harmonic trap 
of frequency wrap uniformly splits levels within each Landau level by 

Wirap /w. (see Supplementary Information). We probe this spectrum 

versus resonator length L,;, and demonstrate that, for each L,;, the 
spectrum is determined by two energies 1(;,9) and 1,1) according to 

Mo,3) = AV (1,0) + Gi%0,1) Mod Vesr, Where we = 27 X 1,1) gives the cyclotron 
frequency and tihnse /w, = 2% X 13,9) provides the harmonic trapping 
frequency. Furthermore, fine-tuning L,; drives Wrap to zero, bringing 
specific sets of angular momentum eigenmodes into degeneracy, thereby 
forming Landau levels. a, The frequency separations between several 
modes and a reference ]=0 mode are plotted as the harmonic confinement 
is coarsely tuned relative to an approximately degenerate reference length 
L,,= 78.460 mm (corresponding free spectral range Vpsp = 3.8209 GHz). 
Solid lines are obtained as integer linear combinations of fits to the modes 
labelled (1,0) and (0,1) and the free spectral range. For details on mode 
indexing, see Supplementary Information. b, Main panel, we plot the 
transmission spectrum of the first ~10 modes in the lowest Landau level 
against small deviations from nominal degeneracy. Top inset, low order 


momentum states encircling the cone tip enclose this flux three times, 
so states experience integer flux, reflected in their all radial 
extension. 

Away from the apex, photons on each cone behave as in a flat space 
lowest Landau level. In Fig. 3d, we identify each cone by the lowest 
angular momentum state supported around its apex. Then, on each 
cone, we show that we can create arbitrary angular momentum states 
(J=0, 1) about displaced points so long as the displaced mode does not 
self intersect or encircle the cone tip. Beyond reflecting the invariance 
of our system under magnetic translations, this permits the creation 
of canonical fractional quantum Hall states in a future interacting 
system, in addition to novel Laughlin states accessible at the cone 
tip (see Supplementary Information). As a visualization, Fig. 3e, f 
projects these displaced /=0 and /=1 modes onto a cone, further 
demonstrating that, away from the apex, modes on the cone closely 
resemble modes on a regular plane. 

The topological numbers that characterize quantum Hall phases are 
predicted to specify the response of the photonic local density of states 
(LDOS) to magnetic field and spatial curvature, as described by the 
Wen-Zee theory!*-!> (see Supplementary Information). We perform 
an experimental test of this theory by measuring the LDOS (Fig. 3g-i) 


modes become degenerate to within a resonator linewidth, « ~ 200 kHz, 
while in the main panel, we observe weak level repulsion (approximately 
equal to the resonator linewidth) in the higher order modes consistent 
with mode mixing due to mirror imperfections of ~/5,000. Wrap is 
presented on the upper horizontal axis. Bottom insets, as the resonator 

is tuned through degeneracy, the harmonic potential (orange surface) 
changes sign, while the magnetic field (blue arrows) remains nearly 
unchanged. c, The lifetimes (and corresponding finesses) of representative 
modes decrease for higher mode numbers both away from degeneracy 
(blue circles) and near degeneracy (green squares). Here AL is the offset 
of the round-trip resonator length from nominal degeneracy. d, With 
significant residual harmonic trapping (AL = 124m), angular 
momentum modes are simple rings. As the trapping is reduced 
(AL=32\.m), high angular momentum modes begin to mix owing to local 
disorder. When the trapping is precisely cancelled (AL = —3 1m), mirror 
imperfection consistent with a single nanoscopic scratch dramatically 
alters the modes’ shape away from the predicted near-Laguerre—Gauss 
profiles. Even the first resonator mode is noticeably triangular, indicating 
at least a mixing of Laguerre-Gauss /= 0 and /=3 modes. Overcoming 
this disorder necessitates only ~MHz photon-photon interactions to 
explore strongly correlated physics. 


via transmission images of each state in the relevant weakly split Landau 
level and summing these images (see Supplementary Information). We 
then compare the LDOS near the cone tip with the flat space density 
away from the tip (within each panel Fig. 3g-i) and compare the LDOS 
with different quantities of flux threaded (between panels Fig. 3g-i). 
We clearly observe a density build-up for the c=0 cone; however, we 
find a vanishing LDOS on the other two cones, reflecting additional 
magnetic flux threaded through their tips equal to —®p/3 and —2®)/3, 
where ®p is the magnetic flux quantum (Fig. 3c). According to the 
Wen-Zee theory, the expected excess state number is given by 
§N = 25 — £, where c/3 is the number of flux quanta threaded through 
the cone tip and § is a parameter called the mean orbital spin that char- 
acterizes particles’ coupling to spatial curvature and is predicted to be 
1/2 for the lowest Landau level'” (see Supplementary Information). We 
therefore expect N= 1/3, 0, and —1/3 of a state near the tips of the 
c=0, 1, and 2 cones, respectively. By integrating the measured LDOS 
excess or deficit near the apex, we measure the state number excess to 
be 0.31(2) on the c=0 cone, —0.02(1) on the c= 1 cone, and —0.35(2) 
on the c=2 cone, yielding the experimentally measured value 
5 =0.47(1). We find quantitative agreement between our measured 
results and the Wen-Zee theory. 
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Figure 3 | Photonic lowest Landau levels on a cone. a, At degeneracy, 

all resonator modes display three-fold symmetry. We present a very large 
displaced angular momentum mode with radial extent up to 8 times the 
mode waist, wo, implying that ~20 modes must be degenerate. The rapid 
phase winding for large / modes causes the strong fringing pattern when 
the mode self-interferes. Inset, an /=0 mode at the same scale. b, We 
project another large-angular-momentum mode onto a cone and view it 
from above the apex. We observe a general property that circular orbits 
must encircle the cone apex either zero or three times. Inset, the original 
image of the mode. The pair of rays overlaying the inset image corresponds 
to the cut in the main image. c, The twisted resonator corresponds to 
Landau levels on three cones with differing quantities of magnetic flux 
threaded through the tip. The cone built out of ]=0, 3, 6, ... has no flux 
threading; the cone built out of /= 1, 4, 7, ... is threaded by ®o/3; and 

the cone built out of ]=2, 5, 8, ... is threaded by 2/3, where ®p is the 
magnetic flux quantum. d, With the resonator tuned to degeneracy, we 
identify the energies of the /=c modes for c=0, 1, or 2 by the transmission 
peaks (blue, orange, and green curves, respectively) that correspond the 
correct observed transmitted modes’ profiles (single images, labelled). 
The degenerate sets starting with these modes each form a lowest Landau 
level on different cones. Except at the apex, each cone is flat, so away from 


We have demonstrated a synthetic magnetic field for continuum 
photons. Furthermore, we have created an integer quantum Hall sys- 
tem in curved space, a long-standing challenge in condensed matter 
physics. We can extend our tests of the Wen-Zee theory by measuring 
fractional state number excess in higher Landau levels and examining 
the connection between the mean orbital spin and the Hall viscosity” 
(see Supplementary Information). Our approach clears a path to the 
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the tip each lowest Landau level supports modes of—and therefore the 
dynamics of—a planar lowest Landau level with /=0, 1, 2, ... defined 
about a displaced point. On each cone, we show displaced !=0 (bottom 
two) and /= 1 (top two) modes. For large displacements (right two), these 
modes are undistorted; however, for small displacements (left two), where 
there is significant mode amplitude at the tip, we observe distortions 

due to self-interference, similar to a. e, f, Displaced ]=0 and /=1 modes 
from d are projected onto a cone to show how observed mode images 
may be interpreted on a conical surface. g-i, We explore the effects of 
curvature and flux threading near the tip by measuring the local density of 
photonic states. For the c=0 cone (i), we find an approximately threefold 
increase in local state density near the cone apex above a constant 
background plateau of density. This corresponds to an additional one- 
third of a state localized near the apex. For the cones with c= 1 and 2 
(hand g, respectively), we find a vanishing local density of states near the 
apex, reflecting the negative magnetic flux threading through the cone 
apex. Each unit of flux removes one-third of a state local to the apex so 
that the c= 1 cone has no additional states, and the c= 2 cone is missing 
one-third of one state. The data to the right display a slice through the 
middle of each image; the grey curves are fits to the expected analytic form 
(see Supplementary Information). 


photonic fractional quantum Hall regime, as it is compatible with 
Rydberg-mediated strong photon-photon interactions’, and does 
not require the low particle densities (and thus weakened interactions) 
necessary to map Laughlin physics onto a lattice. Simply avoiding the 
cone apex will allow the spectroscopic creation and detection of flat 
space fractional quantum Hall states such as the Laughlin wavefunction 
(see Supplementary Information), while exploring the apex will afford 
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the opportunity to investigate the interplay of geometry and topology 
in strongly correlated quantum materials. 
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Switching stiction and adhesion of a liquid on a solid 


Stijn F. L. Mertens!?*, Adrian Hemmi**, Stefan Muff’, Oliver Groning*, Steven De Feyter!, Jiirg Osterwalder? & Thomas Greber?* 


When a gecko moves on a ceiling it makes use of adhesion and 
stiction. Stiction—static friction—is experienced on microscopic 
and macroscopic scales and is related to adhesion and sliding 
friction!. Although important for most locomotive processes, the 
concepts of adhesion, stiction and sliding friction are often only 
empirically correlated. A more detailed understanding of these 
concepts will, for example, help to improve the design of increasingly 
smaller devices such as micro- and nanoelectromechanical switches”. 
Here we show how stiction and adhesion are related for a liquid drop 
ona hexagonal boron nitride monolayer on rhodium’, by measuring 
dynamic contact angles in two distinct states of the solid—liquid 
interface: a corrugated state in the absence of hydrogen intercalation 
and an intercalation-induced flat state. Stiction and adhesion can be 
reversibly switched by applying different electrochemical potentials 
to the sample, causing atomic hydrogen to be intercalated or not. 
We ascribe the change in adhesion to a change in lateral electric 
field of in-plane two-nanometre dipole rings*, because it cannot 
be explained by the change in surface roughness known from the 
Wenzel model®. Although the change in adhesion can be calculated 
for the system we study’°, it is not yet possible to determine the 
stiction at such a solid-liquid interface using ab initio methods. 
The inorganic hybrid of hexagonal boron nitride and rhodium is 
very stable and represents a new class of switchable surfaces with 
the potential for application in the study of adhesion, friction and 
lubrication. 

Every object at rest sticks with some adhesion to its substrate. If this 
object is moved, then a force must act on it. This force has to overcome 
the stiction threshold, above which the object starts to move and below 
which it sticks to the substrate. This fundamentally simple principle 
indicates a relationship between adhesion and stiction that is valid 
for a range of length scales from single-atom manipulation’ up to the 
everyday experience of moving ourselves. Stiction is also related to 
sliding friction! and surface diffusion’. Empirically, these properties 
are connected via dimensionless coefficients that relate, for example, 
the diffusion barrier to the adsorption energy or the sliding friction 
force to the load. The advent of single-atom probes on surfaces and 
the measurement of the heat of adsorption have made the Evans- 
Polanyi relation, which postulates proportionality between the acti- 
vation energy for diffusion and the adsorption energy, experimentally 
accessible!”. For macroscopic objects, on the other hand, Amontons’ 
first law, which states that sliding friction is proportional to the load, 
is much more complex because of the many length scales involved. 
This complexity also applies to stiction and, consequently, an ab initio 
understanding is difficult'’. For such problems we rely on clear-cut 
responsive-surface model systems to study the influence of micro- 
scopic effects on their macroscopic expressions and vice versa. 

For responsive surfaces, a collective change of a microscopic atomic 
or molecular parameter triggers a macroscopic property change such 
as the wetting angle of a liquid; such surfaces are central to nanosci- 
ence and smart-materials research’"'". Typically, the responsiveness 
is invoked by a change of conformation or charge state of organic 


molecules at the solid—liquid interface, which can be triggered by 
light'?, temperature’? or electric fields’*. The organic molecules at 
the basis of the switching, however, render these surfaces quite fragile 
and limited to near-ambient temperatures. By contrast, the inorganic 
hexagonal boron nitride/rhodium (h-BN/Rh) hybrid we focus on has 
very high thermal and chemical stability (see Methods), and therefore 
signifies a class of responsive surfaces with considerable technolog- 
ical promise. The stability of the h-BN/Rh system is related to the 
covalently bonded network of the h-BN ‘skin’ in comparison with the 
weaker supramolecular interactions in self-assembled monolayers. 

The boron nitride nanomesh is a corrugated monolayer of h-BN 
on rhodium*!*: 13 x 13 h-BN units form a superhoneycomb struc- 
ture on 12 x 12 Rh(111) unit cells, in which particularly strong lateral 
electric fields exist that are, for instance, decisive for the self-assembly of 
molecules*. The nanomesh structure is stable in vacuum up to 1,000°C 
and survives immersion into liquids*!°. In ultrahigh vacuum, it was 
found that the boron nitride layer could be made flat by intercalation 
of atomic hydrogen'’. Here, we demonstrate this effect under electro- 
chemical conditions and show that h-BN/Rh(111) is a surface with 
switchable wetting and adhesion. 

The electrochemical switching of the surface texture is based on 
intercalation of atomic hydrogen, as is demonstrated in Fig. 1. Figure la 
shows cyclic voltammograms for a clean Rh(111) film and a h-BN 
nanomesh sample in perchloric acid solution in a sessile drop 
configuration’®. In the negative scan direction, the Rh(111) reference 
voltammogram exhibits the typical atomic hydrogen adsorption peak 
just before the onset of molecular hydrogen evolution!’. On h-BN/ 
Rh(111), this peak occurs at less negative potentials, which suggests 
that the process is energetically slightly more favourable on the nano- 
mesh. During the reverse scan, the single desorption peak on Rh(111) 
shows a double peak for h-BN/Rh(111), which indicates a two-stage 
desorption process. Integration of the hydrogen adsorption peak yields 
75 uC cm~*, which amounts to one-third of the bare Rh(111) meas- 
urement where one monolayer of hydrogen may be adsorbed, and 
confirms earlier observations'®!’. On continued cycling (Fig. 1b), 
a slight sharpening of the adsorption peak occurs; the charge related to 
the peak integral changes by less than 15%, demonstrating the stability 
of the h-BN overlayer upon cycling. 

The changes in nanotexture that accompany the hydrogen adsorp- 
tion and desorption were visualized directly by in situ electrochem- 
ical scanning tunnelling microscopy (EC-STM). Figure 1c-f shows 
the same area for different substrate potentials. Initially, at potential 
E, (Fig. 1c), the hexagonal pattern of pores separated by wires is 
observed'®*!”. The 3.2-nm periodicity can also be seen in the auto- 
correlation ofa given section from one terrace, and the corrugation is 
reflected by the root-mean-square (r.m.s.) roughness. Crossing of the 
hydrogen adsorption peak (potential E2, Fig. 1d) leads to flattening of 
the surface, which is reflected in the vanishing of the superstructure 
and a decrease in the r.m.s. roughness. A tip change between Fig. 1c 
and Fig. le caused a contrast inversion of the pore, which, however, 
does not affect the r.m.s. analysis. When the potential is set back to 
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the initial value, the corrugation gradually reappears (Fig. le), which 
indicates that deintercalation is slower than the forward process (see 
Methods and Extended Data Fig. 2). After the cycle (Fig. 1f), the 
superhoneycomb shows different imperfections (for example, some 
pores appear connected). The physical picture that emerges is illus- 
trated in Fig. 1g: electrochemical intercalation of hydrogen reversi- 
bly flattens the h-BN/Rh nanomesh, and in doing so switches off the 
dipole rings in the surface. 

To prove unequivocally that hydrogen intercalation is at the basis 
of the observed switching effect, we conducted an ambient-to- 
vacuum transfer experiment with the goal of quantifying the interca- 
lated hydrogen. The electrochemical treatment was performed in heavy 
water, and the h-BN/Rh(111) sample was extracted from the solution 
under potential control and transferred in less than 10 min to vacuum 
(see Methods) where thermal desorption spectroscopy (TDS) was 
carried out, as shown in Fig. 2. Deuterium desorption from the sam- 
ple held at potential E> (flat state) is obvious, whereas no substantial 
desorption took place from a sample that was kept at E) (corrugated) 
throughout. The desorption temperature indicates slightly higher 
binding energy as compared to hydrogen on bare Rh(111) (ref. 20). 
This further confirms hydrogen intercalation, because hydrogen (or 
deuterium) on top of h-BN would not bind as strongly as on rhodium 
and would not survive the transfer from the liquid to the vacuum. Our 
experiments do not show atomistic details of the way in which the 
hydrogen intercalates, although it has been claimed”! that protons can 
pass through an intact h-BN layer, and that this process is facilitated by 
the presence of a platinum group metal (see Methods). In this scenario, 
the hydrated protons in the liquid electrolyte are probably stripped of 
their solvation shell the instant intercalation occurs, or a Grotthuss- 
type transport mechanism may be involved”*. 

Stiction of the electrolyte on the h-BN surface is studied by measur- 
ing dynamic contact angles, which differ from the equilibrium value 


Potential (V vs Ag/AgCl) 


adsorption/desorption peaks over multiple 
cycles (28 cycles shown, recorded in a standard 
electrochemical cell; scan rate, 10 mV s~!). 

The charge of the hydrogen adsorption peak 

as a function of cycle number is shown in the 
inset. c-f, Sequence of in situ STM images for a 
region with three atomic terrace levels, at various 
substrate potentials: E; =0 V (c); Ex=—0.25V 
(d); switching E, — Ej (e; image scanned from 
bottom to top); and E; =0V after recovery 

of corrugation (f). The red square highlights the 
same area of the sample in all panels; close-ups 
(1.2 x magnification) are shown in the 
bottom-left panels. The white numbers are 

the r.m.s. roughness in units of the terrace 
height. The bottom-right panels show the 
autocorrelations of the bottom-left panels and 
reveal, if present, the 3.2-nm periodicity of the 
h-BN superstructure. Image sizes, 66nm Xx 66 nm; 
tunnelling current, 0.1 nA; tip potential fixed at 
—0.45 V. g, Three-dimensional representation 

of the flat (left) and the corrugated h-BN layer 
(right). N, sky blue; B, pink; Rh, dark grey; H, 
white. For clarity, the heights of N and B above 
the Rh top layer have been stretched by a factor of 
three. Coordinates taken from refs 17 and 29. 


in Young’s equation. This difference can be directly observed ifa liquid 
drop ona solid loses volume by evaporation, where the drop footprint 
will start to move only below a critical contact angle (the receding 
angle). Likewise, when the volume of the drop is gradually increased, 
the footprint moves when the advancing contact angle is reached”*”*. 
The difference in advancing and receding angles and the concomitant 
contact angle hysteresis are a macroscopic expression of adhesion and 
stiction. In Fig. 3, we show dynamic contact angle measurements for 


Deuterium pressure (101° mbar) 


1 1 1 1 
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Figure 2 | Deuterium thermal desorption spectra. h-BN/Rh(111) 
was exposed to 0.1 M DCIO, in D20 at substrate potentials of E; =0V 
(black trace) and E, = —0.25 V (red trace). After loading the samples 
in the electrolyte they were transferred to ultrahigh vacuum for TDS 
(see Methods). Heating rate, 0.9Ks 1. 
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Figure 3 | Dynamic contact angle measurements. a, Left (orange) and 
right (red) contact angles (right axis) and drop footprints (with diameters 
of 2r; grey, left axis) as functions of time. Pictures of an advancing (1) 

and a receding (2) drop in the transients at 100 mV (corrugated state; 
indicated by the circled points in the left panel) are shown on the right. 

b, Left (dark blue) and right (light blue) contact angles and drop footprints 
as functions of time. Pictures of an advancing (3) and receding (4) drop 

in the transients at —350 mV (flat state; indicated by the circled points 

in the left panel) are shown on the right. c, Contact angles in a (100 mV, 
red) and b (—350 mV, blue) versus 1/r for times above 40s. Extrapolation 
to 1/r=0 (infinite drop footprint) yields reduced contact angles that 

are independent of the contact line. d, Reduced receding (bottom) and 
advancing (top) angles for different applied potentials, where the hydrogen 
intercalation peak is found at about —0.2 V. The large red and blue points 
correspond to the values determined in c. The solid lines are sigmoid fits 
through all experimental points (green circles), from which values of 0, 
and 0, for the corrugated (76.0° + 1.2° and 24.5° + 1.4°, respectively) and 
the flat (84.3° + 2.3° and 40.1° + 3.2°, respectively) state are determined. 
The grey band indicates the 1-s.d. prediction interval. 


applied potentials around the hydrogen adsorption peak. The angles as 
determined from inflating and deflating the drops in the corresponding 
videos (Supplementary Videos 1, 2) are shown for a potential above 
(Fig. 3a) and below (Fig. 3b) the hydrogen peak, that is, for the corru- 
gated and the flat surface, respectively. The drop volume was changed 
periodically and distinct receding angles 0, and advancing angles 0, 
were observed. Four positions are shown as photographs of the elec- 
trolyte drop on a h-BN/Rh(111) sample with the capillary, which is 
0.85 mm in diameter, entering the drop from the top. The experiments 
show distinct hysteresis in the contact angles of about 60°. The observed 
angles also depend on residual defects, which is reflected in the asym- 
metry between the left and the right footprint, and in sudden angle 
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Figure 4 | Wetting angle hysteresis and stiction. The footprint of a drop 
starts to move with an advancing (left; @,) or a receding (right; 0,) angle, 

if the friction-induced lateral tension exceeds the stiction threshold yy. 
The equilibrium angle from Young’s equation Op lies between 6, and 6;. 

Ys. Ys and 7, are the corresponding solid-gas, liquid-solid and liquid—gas 
interface energies, respectively. 


changes that are reproduced for subsequent cycles. All the effects due 
to residual imperfections, however, are distinct from the effect of the 
electrochemical potential. The size dependence of the wetting hyster- 
esis is a consequence of the line tension, that is, the work required to 
change the length of the drop perimeter”. If the wetting angles are 
plotted against the inverse of the footprint radius (Fig. 3c), then the line 
tension and the extrapolation of the wetting angle for infinite footprints 
may be inferred. In Fig. 3d, these extrapolated advancing and receding 
angles are plotted for a large set of experiments as a function of the 
applied potential, correlating the applied substrate potential with the 
measured contact angles. Clearly, the hystereses for potentials above 
—200 mV are different from those below this threshold of the hydrogen 
adsorption peak. Further experiments indicate that neither visible evo- 
lution of molecular hydrogen (where the reduction current increases 
exponentially in the voltammogram) nor electrowetting effects”®”” can 
be responsible for the observed changes in contact angle. 

The concept for stiction and the concomitant contact angle hys- 
teresis is sketched in Fig. 4: the static friction o balances the forces 
that distort the Young equilibrium 7s — 715 — yLcos(@o) = 0 (in which 
Ys) Yis and 7, are the interfacial energies or tensions of the solid—gas, 
liquid-solid and gas-liquid interfaces, and 6p is the Young equilibrium 
angle). The static friction o acts parallel to the surface, as do yg and 
Ys, and invokes a new equilibrium angle 6,. The new equilibrium 
angle is determined by 


Ys—Yis— Yi cos(@,) +o=0 


The static friction o may, however, not exceed a critical value + yy, 
above which the footprint of the drop starts to move with an advancing 
or a receding angle. In Fig. 3c, d, the size-independent receding and 
advancing angles (that is, the extrapolation to infinite footprints) 6, 
and @, are indicated for the corrugated and the flat h-BN layer. Their 
values are given by 


cos(O,,,) = cos(4y) = 
L 


in which yy > a is the stiction threshold. With the knowledge of y, 
(72 mJ m * for water), which does not depend on the surface of the 
solid, and the determination of @, and 4,, yy and 6 may be directly 
determined. 

Our experiments indicate that the hydrogen-intercalation-induced 
flattening of the h-BN nanomesh increases 6 from 54.9° + 0.8° to 
64.5° + 1.7°. Using the Young-Dupré equation we determine the work 
of adhesion per unit area ws from 7, and the Young equilibrium angle 
Oo (ref. 28): 


Wis=ILtIs— Yis= [1 + cos()] 


The work of adhesion decreases from (1.58 + 0.01), by 9.2% + 1.8% 
in going from the corrugated to the flat layer. The stiction threshold 
Yu of (0.33 + 0.01) 71 and (0.33 + 0.03) for corrugated and flat layers, 
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respectively, is (within the error bars) the same for the two interfaces. 
The change in work of adhesion is incompatible with a Wenzel model’, 
according to which a change in effective surface area is responsible for 
a change in wetting angle: the effective surface area of the corrugated 
h-BN is 3% or 0.5% larger than that of the flat layer if a calculated 
corrugation height of 316 pm (ref. 29) or 83 pm (ref. 17) is used. Thus, 
to explain the magnitude of the change in adsorption energy, an elec- 
tronic effect must amplify the change induced by the corrugation. We 
argue that this electronic effect is due to the loss of the lateral electric 
fields (2-nm dipole rings)* in the hydrogen-intercalated surface. The 
ratio of adsorption energies can be tested by comparing them to water 
adsorption®*”. Taking the calculated adsorption energies of water?! 
for different adsorption sites within the h-BN/Rh(111) supercell, we 
calculate the average adsorption energies of the corrugated and the 
flat state. If we take for the corrugated case the portion of the wire, 
pore (hole) and rim regions from Xe-adsorption’, and set the flat state 
equivalent to the unfavourable adsorption on the wires, then we find 
an average adsorption energy that is 18% lower for water monomers 
and 9% lower for hexamers on intercalated h-BN compared to the 
h-BN nanomesh. The comparison to the experiment is much better for 
hexamers, which makes sense because hexamers represent the simplest 
possible ‘drop’ in which only every second water molecule makes a 
hydrogen bond to the boron nitride*®. Beyond these estimations of 
the adsorption energies involved in the switching phenomenon, the 
comprehensively characterized structure of the nanomesh makes this 
surface very attractive for full-scale theoretical treatments of wetting, 
which is difficult for less ordered systems. Molecular dynamics results 
estimate a change in adhesion (adsorption energy) of 4% (ref. 6), 
which compares favourably with the 9.2% extracted from our wetting 
angle data; however, a quantitative theoretical description of stiction 
is still a distant prospect. 

We have presented a switchable surface on which macroscopic 
static friction and adhesion can be linked to our understanding of 
water adsorption on this surface at the microscopic scale. The rel- 
atively simple structure of the h-BN nanomesh makes it amenable 
to accurate descriptions down to the atomic level, and an attractive 
model system for full-scale theoretical analyses of switchable surfaces, 
wetting, friction and lubrication. The fact that the intercalated surface 
survives emersion from the electrolyte could make this test system 
widely applicable for the study of friction and lubrication, even if the 
adhering objects differ from liquid drops. In a biological context, it 
might be possible to create complex multi-cellular arrangements by 
influencing cell migration through controlled switching between 
high and low adhesiveness to cells. In the context of technological 
advancement, the high thermal and chemical stability of the system 
means it could be used even in harsh environments, for example, to 
influence capillary action, stiction and adhesion in microfluidic or 
nanoelectromechanical devices. Finally, beyond the field of adhesion 
and friction, electrochemical hydrogen intercalation may be used to 
obtain freestanding boron nitride layers or, more exotically, to stably 
confine tritium beneath a maximally transparent single-atomic-layer 
moderator, as is needed to determine the neutrino mass through 
3-decay spectroscopy. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Experimental details. The experiments were performed in a clean room on 
single-layer h-BN grown on 150-nm-thick single-crystalline rhodium films on 
Si(111) wafers, which were protected with a 40-nm yttria-stabilized zirconia (YSZ) 
diffusion barrier*’. The wetting angles were determined by a laboratory-built 
CCD camera set up to image at 15 frames per second and with an automated 
angle evaluation scheme (see Methods section ‘Error analysis’). Electrochemical 
potential control was achieved using a Metrohm-—Autolab PGSTAT101 
potentiostat, using a laboratory-built glass syringe carrying a Ag/AgCl/3 M 
NaCl reference and Pt wire counter electrode. All electrochemical potentials 
are reported versus the Ag/AgCl reference. All glassware, including the glass 
capillary for in situ contact angle measurements, was cleaned by boiling in 20% 
nitric acid and rinsing with ultrapure water (MilliQ, Millipore, 18.2 MQ cm), 
to ensure perfect wetting. The electrolyte was prepared from reagent-grade 
70% HC1O, (Sigma-Aldrich) and ultrapure water. For the deuterium desorp- 
tion experiments, light water was replaced with D2O (Fluka, D > 99.8%). All 
operations were carried out in an Ar-filled glove box in a clean room*’. In situ 
STM was performed with an Agilent PicoLE system, using an electrochemically 
etched W tip that was coated with a thermoplastic polymer to minimize the 
faradaic current. The thermal desorption experiments were performed in an 
apparatus equipped with a calibrated quadrupole mass spectrometer (QMS 200 
M2, Pfeiffer Vacuum), and pumped with a turbomolecular pump (TMU 071, 
Pfeiffer Vacuum)!”. 

Thermal and chemical stability of the h-BN/Rh nanomesh. Experience in our 
laboratories since the discovery of the boron nitride nanomesh? indicates a shelf 
life of several years under ambient conditions, stability on exposure to, and ultra- 
sonication in, common solvents (both polar and nonpolar), and after 15-min cycles 
in commercial UV-ozone cleaners. Furthermore, the nanomesh can be heated to 
400°C in air and close to 1,000°C in vacuum’. By contrast, when exposed to hot, 
strongly alkaline solutions, the h-BN appears to be removed, in line with observa- 
tions for bulk h-BN (ref. 34). Low-energy Ar* bombardment followed by annealing 
has been shown to produce atomically precise defects**. Hydrogen intercalation 
appears not to adversely affect the stability of the boron nitride layer!’. As shown 
here, electrochemical cycling can be performed many times without signs of dete- 
rioration (Fig. 1b). Emersion of the intercalated sample and transfer to vacuum are 
possible and yield similar amounts of intercalated hydrogen as in an all-vacuum 
experiment (next section, Fig. 2 and ref. 17). 

Adventitious adsorption from solution or the laboratory atmosphere was not 

observed, and attempts to adsorb organic molecules from solution were unsuccess- 
ful. These observations, together with the widely available cleaning methods listed 
above, make the nanomesh amenable to experiments under standard laboratory 
conditions. 
Ambient-to-ultrahigh-vacuum transfer and TDS. A critical factor in our ambi- 
ent-to-vacuum experiment is the successful emersion of the sample from the 
electrochemical environment, which means that potential control is required 
until the solid—bulk electrolyte interface ceases to exist. This way, the h-BN/Rh- 
electrolyte system is never allowed to assume its open circuit potential, where 
electrochemical deintercalation of the hydrogen may occur. This also explains 
why we can rinse the sample with ultrapure water to remove superficially adsorbed 
deuterium and traces of perchloric acid, before transferring the sample to vacuum. 
The successful extraction of the sample from the electrolyte is aided by its mild 
intrinsic hydrophobicity, akin to previous work” in which hydrophobic adlayers 
on noble metal electrodes after transfer to vacuum were studied. 

The sample mounting and transfer into the vacuum chamber took about 
10min, during which the sample was exposed to water and ambient air. Because 
we expected that the remaining solvent molecules on the sample may still contrib- 
ute to a deuterium signal during the desorption experiment, the sample was rinsed 
with ultrapure water after removal from the DO solution and before entry into the 
vacuum system. Artefacts from this procedure were excluded by comparing TDS 
spectra from two samples that were handled identically, but of which only one was 
held at the potential required for intercalation (see Fig. 2). 

In the design of the sample holder, special precautions were taken to limit 
heating and desorption as much as possible to the sample itself, and to avoid a 
background covering the deuterium signal from the surface or risking the cham- 
ber pressure exceeding the safe operation limit of the mass spectrometer channel 
electron multiplier. To maximize the signal-to-noise ratio, heating should occur 
as locally and as fast as possible. Extended Data Fig. 1a shows the sample holders 
used, which allow the sample to be annealed via resistive heating solely through 
the 150-nm-thick Rh film*’. In this configuration, only the top tungsten clamps 
became as hot as the sample and temperatures of 580°C could be reached with 
a maximum pressure load of about 1.2 x 10° mbar. The sample configuration 
requires the use of a pyrometer to assess the sample temperature, which was 


calibrated against a thermocouple reading of a Rh sample as shown in Extended 
Data Fig. 1b. Because the lowest measureable temperature by the pyrometer is 
around 350°C, additional extrapolation of the curve sections outside this region 
was required. 

Extended Data Fig. 1b displays all relevant information of a desorption experi- 
ment, such as the applied power to the sample, pyrometer, mass spectrometer and 
integral pressure readings. To determine the sample temperature, the following 
differential equations were used, which consider the power applied to the sample 
and its heat loss through radiation and thermal conduction to the sample holder: 


a = aP(t) + DP(E)T(t) (1) 
a = aP(t) +bT(t) (2) 


with applied power P, sample temperature T, time t and fitting parameters a and b. 
Best results were obtained using equation (1) for the ascending temperature region 
and equation (2) for the descending temperature region of the heating/cooling 
curve. The extrapolated temperature was finally a complete numerical simulation 
of the temperature evolution, with room temperature as the starting point and the 
applied power being the only input variables. 

Rate of hydrogen (de)intercalation. The hydrogen intercalation we study is closely 
related to underpotential deposition of hydrogen—that is, electrosorption at poten- 
tials positive from the hydrogen evolution equilibrium potential—which is known 
to occur on the platinum group metals’, and to involve comparable energies on Pt 
and Rh**?, As shown in Fig. 1a, hydrogen intercalation into h-BN/Rh occurs at 
a potential that is about 50 mV less negative than that for underpotential deposi- 
tion of hydrogen on bare Rh(111), which may be due to the fact that intercalating 
hydrogen does not have to compete with ClO, for adsorption on Rh after passing 
the boron nitride overlayer’. 

From the in situ STM image in Fig. le, which was recorded immediately 

after switching the substrate potential from a value at which intercalated hydro- 
gen is stable to a value at which electrochemical deintercalation takes place, 
it is apparent that recovery of the corrugation requires about half of the time 
taken to record one image. If the local corrugation is taken as a measure of 
deintercalation, then the line speed (8.1 lines per second) and image resolution 
(512 x 512 pixels) indicate a characteristic time of approximately 30s. The for- 
ward process (intercalation) reproducibly led to a flattening in EC-STM in less 
than 1s (Extended Data Fig. 2), indicating that intercalation is considerably 
faster than deintercalation. Restoration of the corrugation must be considered 
as the last step in the electrochemical deintercalation; it is not obvious which is 
the rate-determining process, to which the pore and wire areas of the nanomesh 
may contribute differently. The observed difference in the rate of intercalation 
and deintercalation may originate from unilateral catalytic activation by the Rh 
substrate?! —that is, the different proton affinity of the h-BN/Rh and the h-BN/ 
electrolyte interface—and is also convoluted with the kinetics of underpotential 
deposition of hydrogen*”. 
Error analysis. The contact angles from one image frame are determined on 
the basis of fits of 32 sigmoid functions to the change in optical contrast for the 
left and right periphery of the drop, which are used to determine the location 
of the edge of the liquid drop. These fits, which are based on 60 raw data pixels, 
are all located in an interval of 0.25 mm centred at the footprint of the drop, and 
yield a fitting error for the edge location of about 0.8 j1m. This error is five times 
less than the optical pixel width of the image and can be explained by the very 
sharp optical contrast between the background and the liquid drop. The angle 
is then fitted with two separate linear functions, one for the real edge between 
the background and the drop and one for its mirror image. The crossing of these 
two lines reveals the location of the footprint; the contact angle a is half the angle 
between the two fitted lines: 


a= 0.5[arctan(|b,|) + arctan(|b2|)] (3) 


in which b, and b; are the slopes of the two linear fits (real angle and mirror angle). 
Equation (3) is valid for the left and right angles. Gaussian error propagation of 


equation (3) yields: 
0.50b; ig 0.50b2 ' 
Aa=, 7| 4 2 
\( 14+), 1+ 5 


in which ob; is the error of the slope fits, and the accuracy of one measured contact 
angle is determined to be about 0.3°. For each extrapolation of the wetting angles 
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to 1/r=0, we used about 400 angles, which were determined using the procedure 
outlined above. The four linear fits shown in Fig. 3c are based on 412, 410, 344 
and 402 individual angle data points, respectively from top to bottom. Because of 
the high accuracy and the large number of data points, we may neglect the error 
of the individual angles. The influence of, for example, residual defects, which 
induce asymmetries between the left and right angles and angle spikes (Fig. 3a-c), 
is visibly larger. The error of the reduced contact angle is the fitting error of the 
6-axis intersection; the largest error for the datasets in Fig. 3c is 0.8°, which is well 
below the size of the printed dot. 

In Fig. 3d we show all experiments and the corresponding advancing and 
receding angles performed on three different samples. The largest errors are not 
from the angle determination procedure described above, but mainly from differ- 
ences between the samples, sample handling and manual operation of the syringe. 
The uncertainties that we cite are derived from the standard deviations of the 
mean values of the advancing and receding angles in Fig. 3d, where advancing 
and receding angles are taken from the sigmoid fits at the same applied potential. 
Values of 6, and 0, of 76.0° + 1.2° and 24.5° + 1.4°, respectively, for the corrugated 
and 84.3° + 2.3° and 40.1° + 3.2°, respectively, for the flat state are found. The values 
and error bars for 6, wrs and 7, are determined from the four angles and their 
standard deviations using Gaussian error propagation. 
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Extended Data Figure 1 | Thermal desorption spectroscopy. a, h-BN/ which were calibrated to a previous thermocouple measurement using a 
Rh(111) thin-film sample and sample holder ready for the desorption Rh crystal. The simulated temperature is a solution of equations (1) and 
experiment in the ultrahigh-vacuum chamber. b, Thermal desorption (2) with the applied power and a starting temperature of 25°C as input 
experiment with temperature and power data shown in the top panel and variables. 


pressure data in the bottom panel. The purple data are pyrometer readings, 
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intercalation. The substrate potential during scanning (image scanned 
from bottom to top) was switched from E, =0 V (deintercalated, yellow 
arrow) to E, = —0.25 V (intercalated, light blue arrow) at about one-fifth 
of the way from the lower edge of the STM image. On the timescale of 
imaging, intercalation-induced flattening of the surface is instantaneous, 
in sharp contrast to deintercalation (compare with Fig. le). Image size, 
66 nm x 66 nm; tunnelling current, 0.1 nA; tip potential fixed at —0.45 V. 
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Seasonality of temperate forest photosynthesis and 


daytime respiration 


R. Wehr', J. W. Munger’, J. B. McManus’, D. D. Nelson’, M. S. Zahniser?, E. A. Davidson’, S. C. Wofsy? & S. R. Saleska! 


Terrestrial ecosystems currently offset one-quarter of anthropogenic 
carbon dioxide (CO) emissions because of a slight imbalance 
between global terrestrial photosynthesis and respiration’. 
Understanding what controls these two biological fluxes is 
therefore crucial to predicting climate change”. Yet there is no way 
of directly measuring the photosynthesis or daytime respiration of 
a whole ecosystem of interacting organisms; instead, these fluxes 
are generally inferred from measurements of net ecosystem- 
atmosphere CO, exchange (NEE), in a way that is based on assumed 
ecosystem-scale responses to the environment. The consequent 
view of temperate deciduous forests (an important CO sink) is 
that, first, ecosystem respiration is greater during the day than at 
night; and second, ecosystem photosynthetic light-use efficiency 
peaks after leaf expansion in spring and then declines’, presumably 
because of leaf ageing or water stress. This view has underlain 
the development of terrestrial biosphere models used in climate 
prediction*” and of remote sensing indices of global biosphere 
productivity™®. Here, we use new isotopic instrumentation’ to 
determine ecosystem photosynthesis and daytime respiration® in a 
temperate deciduous forest over a three-year period. We find that 
ecosystem respiration is lower during the day than at night—the 
first robust evidence of the inhibition of leaf respiration by light?" 
at the ecosystem scale. Because they do not capture this effect, 
standard approaches!” overestimate ecosystem photosynthesis 
and daytime respiration in the first half of the growing season at 
our site, and inaccurately portray ecosystem photosynthetic light- 
use efficiency. These findings revise our understanding of forest- 
atmosphere carbon exchange, and provide a basis for investigating 
how leaf-level physiological dynamics manifest at the canopy scale 
in other ecosystems. 

Much of what has been inferred about the behaviour of ecosystem 
photosynthesis, or ‘gross ecosystem production (GEP, defined as 
ecosystem-scale photosynthesis minus photorespiration), and 
‘daytime ecosystem respiration (DER) in forests derives from eddy 
covariance measurements of their difference, denoted ‘net ecosys- 
tem exchange’ (NEE). NEE measurements have greatly advanced 
our understanding of carbon-cycle processes in terrestrial ecosys- 
tems, but the behaviours of GEP and DER have remained uncertain 
because eddy covariance does not distinguish one process from the 
other. In standard practice, a hypothesized response of GEP and/or 
DER to light, water, and/or temperature is used to make that distinc- 
tion and thereby partition NEE into GEP and DER. For example, 
the oldest, simplest, and still most commonly adopted hypothesis 
is that DER follows the same function of air or soil temperature as 
does night-time ecosystem respiration!*'’, which is directly observ- 
able as night-time NEE. Another common partitioning hypothesis 
is that DER follows a function of air temperature of the same form 
found to apply to night-time NEE (but not necessarily with the 
same parameter values), while GEP follows a saturating function of 


photosynthetically active radiation (PAR) of the same form found 
to apply to individual leaves’. 

To date there has been no means of testing these partitioning hypoth- 
eses, but they are nevertheless used in hundreds of studies each year. 
The most popular partitioning algorithm” alone has been cited more 
than 800 times since its debut in 2005, and similar methods have been 
explored since the onset of long-term eddy covariance measurements”. 
The patterns and environmental responses of GEP and DER obtained 
from such methods have been used to design and evaluate terrestrial 
biosphere models for estimating large-scale biosphere—atmosphere 
interactions and predicting climate change*”. Partitioning has also been 
applied to a range of ecosystems to evaluate various remote sensing 
indices that may enable aircraft and satellite measurements of regional 
and global biosphere productivity but have contrasting seasonal 
patterns*®!>!°, And partitioning has been used to investigate ecosystem 
light-use efficiency and water-use efficiency, and to evaluate related 
production efficiency models intended to estimate regional and global 
biosphere productivity'”"*. 

The general pattern observed via standard partitioning of NEE in 
temperate deciduous forests is that DER peaks or plateaus shortly after 
leaf expansion*!, one to two months ahead of the peak in belowground 
respiration’’. Meanwhile, the ecosystem photosynthetic light-use effi- 
ciency peaks shortly after leaf expansion and then gradually declines 
through the growing season—as might occur because of leaf ageing or 
water stress—until autumnal senescence’. 

To investigate the behaviour of GEP and DER, we partitioned three 
growing seasons of subhourly NEE measurements at the Harvard 
Forest into GEP and DER, on the basis of simultaneous eddy covar- 
iance measurements of the stable carbon isotopic composition 
(that is, '°C/!*C) of NEE’. Our isotopic flux partitioning (IFP) algo- 
rithm (see Methods) exploits the fact that GEP and DER have ®C/?C 
signatures® that are almost always distinguishable in subhourly NEE 
given the high precision (Extended Data Fig. 1) of our recently devel- 
oped infrared laser spectrometer’ (see also Extended Data Fig. 2). We 
focus on comparisons with the most common standard partitioning 
algorithm”; comparisons with the partitioning method that incorpo- 
rates a photosynthetic light-response function are broadly similar and 
can be found in the Methods. 

Our analysis indicates that daytime ecosystem respiration differed 
fundamentally from standard predictions that were based on night- 
time NEE and temperature’? (Fig. 1): DER was only about halfas large 
as night-time NEE in the first half of the growing season (June-July), 
but was roughly equal to night-time NEE in the second half (August- 
September). As belowground respiration typically varies by less than 
10% between daytime and night-time at this site”°, the large discrep- 
ancy between daytime and night-time ecosystem respiration in the 
first half of the growing season suggests inhibition of leaf respiration 
by light, known as the Kok effect®. Such inhibition has been found to 
occur at the leaf level in many plant species!°, including tree species!! 
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Figure 1 | Composite diel cycles show that photosynthesis and daytime 
respiration at the Harvard Forest are less than predicted in the first half 
of the growing season. a, b, Fluxes in the tower’s relatively homogeneous 
and well sampled southwest quadrant, averaged across the three years, 
2011-2013, for June to July (a) and August to September (b). Differences 
from the results of standard partitioning for the same data are shaded in 
blue. Lines connect means for each 2-hour bin (20 <n < 94). Partitioning 
is done for daylight periods only; GEP is set to zero in the dark (hatched 
areas). 


(although we know of no published studies in red oak, which domi- 
nates our site). We therefore asked: what would cause inhibition of leaf 
respiration to occur during the first half of the growing season only, 
and is the magnitude of the discrepancy consistent with the overall 
respiration budget? 

Previous work"? at this site on the seasonal patterns of aboveground 
and belowground respiration—a multiyear synthesis of more than 
100,000 flux tower and soil chamber observations—suggests that in 
June, at night, aboveground respiration typically accounts for more 
than 50% of total ecosystem respiration, but that this proportion 
declines to about 10% in August (Fig. 2b, gold line). June-July is the 
period in which leaves are still thickening after expansion'*'° and 
are presumably continuing growth respiration associated with that 
thickening, which might explain the elevated night-time respiration 
of the canopy. (In contrast, night-time canopy respiration is roughly 
stable from June through to September at a nearby conifer-dominated 
site'®.) We infer that aboveground respiration is strongly inhibited by 
light during the day at our site, causing DER to be about half as large 
as night-time ecosystem respiration in June-July but almost equal to 
night-time ecosystem respiration in August-September (Fig. 1). Near- 
complete inhibition of leaf respiration by light has been reported for 
individual leaves of some species!”, and so both the seasonal pattern 
and the magnitude of the discrepancy between daytime and night-time 
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Figure 2 | Composite seasonal cycles of GEP and DER indicate 

strong inhibition of aboveground respiration by light and sustained 
photosynthetic efficiency. Results from isotopic partitioning, standard 
partitioning, and standard partitioning adjusted for 100% inhibition 

of aboveground respiration by light (see Methods), across all forest 
quadrants. a, GEP and DER. b, Discrepancy between standard and isotopic 
partitioning (black line), with the gold line showing the 1996-2009 mean 
seasonal pattern of aboveground respiration (Rapgq) estimated from soil 
chambers and night-time NEE”. c, Light-use efficiency (LUE; isotopic and 
standard partitioning), with absorbed photosynthetically active radiation 
(APAR) inverted in red. d, Intrinsic water-use efficiency (WUE)). Lines 
connect means over all daylight hours for each 12-day bin; pale bands 
show standard errors of the means calculated from variability within each 
bin (64 <n < 431). Bands are omitted from the 100%-inhibition lines for 
clarity. Hatched areas indicate periods of leaf expansion and abscission. 


ecosystem respiration are consistent with a plausible ecosystem-scale 
Kok effect. 

Standard partitioning calculates DER from night-time NEE without 
accounting for any inhibition of respiration by light, and so the season- 
ally varying discrepancy between night-time and daytime ecosystem 
respiration (as determined by isotopic partitioning) corresponds to a 
similar, seasonally varying discrepancy between standard and isotopic 
estimates of DER (Fig. 2a, b). However, standard partitioning—and 
therefore the discrepancy between standard and isotopic partitioning— 
can be complicated by horizontal heterogeneity in ecosystem 
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Figure 3 | The ecosystem-scale light-response curve is invariant over 
the season. Responses of GEP (from isotopic and standard partitioning) 
to APAR, in June-July and August-September (before leaf abscission), 
averaged across all three years, for vapour pressure deficits (VPDs) of 
between 0 Pa and 1,200 Pa, in the flux tower’s southwest quadrant. Lines 
connect means for each APAR bin, and pale bands show standard errors 
of the means calculated from variability within each bin (14 <n < 63). 
Also shown (right-hand axis) are the leaf-level light-saturated net 
photosynthesis rates (that is, net leaf CO2-uptake rates) reported for the 
years 1991 to 1992 in ref. 26, based on 43 red oak leaves in mid July and 
44 red oak leaves in late August; the small increase from mid July to late 
August was also reported for red maple leaves in ref. 26. Bands are omitted 
from the August-September curves for clarity. 


respiration surrounding the flux measurement tower. Standard par- 
titioning uses all night-time fluxes in a 4- or 6-day time window to 
predict DER for each individual daytime flux measurement. The 
night-time fluxes typically correspond to many different sectors of the 
forest, according to the winds; but the individual daytime flux being 
partitioned corresponds to one particular sector. Therefore, standard 
partitioning effectively averages across sectors, underestimating DER 
for high-respiration sectors and overestimating it for low-respiration 
sectors®. Moreover, if the winds differ systematically between day and 
night, standard DER predictions can be further biased. At our site, 
ecosystem respiration was more than two times larger to the north- 
west of the flux measurement tower than it was to the southwest®, and 
the day-versus-night wind biases in our data set were such that the 
June-July discrepancy between standard and isotopic estimates of 
DER is understated when averaging across all sectors of the forest 
(as in Fig. 2), but exaggerated when averaging only within the southwest 
quadrant (as in Fig. 1). Without these sampling biases, the standard 
method would have overestimated DER by about 100% in June-July 
(see Methods). 

We have thus far described only the period between the com- 
pletion of leaf expansion and the onset of autumnal senescence 
(June-September). Prior to leaf expansion, aboveground respiration 
already constitutes more than half of night-time ecosystem respiration, 
probably because of respiration associated with bud and leaf develop- 
ment, branch elongation, and wood production”. In this period, we 
found that daytime and night-time ecosystem respiration were roughly 
equal, so that the standard and isotopic partitioning methods agreed 
(Fig. 2a). This result is expected, as respiration should be inhibited by 
light only in photosynthesizing tissues. 

The revised seasonal pattern in DER corresponds to a new sea- 
sonal pattern in GEP, because DER and GEP are constrained to sum 
to the measured NEE in both partitioning methods. Whereas stand- 
ard partitioning suggested a gradual decline in the response of GEP 
to absorbed photosynthetically active radiation (APAR) through the 
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growing season, isotopic partitioning showed that the response of GEP 
to APAR was stable between the completion of leaf expansion and the 
onset of autumnal senescence (Fig. 3). Neither GEP (Fig. 2a) nor the 
canopy light-use efficiency (LUE, being GEP/APAR) (Fig. 2c) deter- 
mined by isotopic partitioning exhibited the pronounced early-season 
peak that typifies standard estimates at this site and in temperate forests 
more generally’. Compared with isotopic partitioning, and aside from 
the directional sampling biases mentioned above, standard partitioning 
overestimated GEP by about 25% in June-July. 

Qualitatively, the observed saturating response of GEP to APAR 
(Fig. 3) and the resulting negative correlation between LUE and 
APAR (Fig. 2c) make physiological sense for three reasons. First, as 
the amount of light increases, the photosynthetic apparatus of a leaf 
becomes increasingly limited by other factors, such as CO, supply”!. 
Second, cloudy or hazy conditions cause the amount of light to decline 
but the remaining light to be more diffuse and therefore spread more 
evenly among the canopy leaves, strongly increasing assimilation by 
otherwise shaded leaves but only mildly reducing assimilation by other- 
wise sunlit, light-saturated leaves”. Third, high-light conditions tend to 
be associated with drier air, which can lower LUE by inducing stomatal 
restriction of the CO; supply. Quantitatively, the observed seasonal- 
scale relationship between isotopically determined LUE and APAR 
(7? =0.75, P=0.002 between the completion of leaf expansion and the 
onset of abscission; Fig. 2c and Extended Data Fig. 3) is expected, in 
the sense that it is consistent with relationships obtained from both 
standard and isotopic partitioning at shorter timescales (Extended Data 
Fig. 4). In contrast, LUE predicted by standard partitioning did not 
yield any significant seasonal-scale correlation with APAR (r= 0.07, 
P=0.39) (Fig. 2c and Extended Data Fig. 3). There would indeed be 
such a correlation were it not for the June-July LUE peak, which we 
argue is an artefact arising because standard partitioning does not 
capture the inhibition of leaf respiration by light. 

Another common metric of photosynthetic efficiency is canopy- 
intrinsic water-use efficiency (WUE))”%, which is the ratio of carbon 
gained by photosynthesis (GEP) to water lost by transpiration (Ey), con- 
trolling for the water-vapour mole fraction gradient between the inside 
of the leaves and the air (Aw). Thus, WUE;= 1.6 x Aw x GEP/Ery 
(the factor of 1.6 in this definition accounts for the fact that water 
vapour diffuses 1.6 times more quickly than CO,). We found that WUE; 
was more stable throughout the growing season than is predicted by 
standard partitioning (Fig. 2d), implying a stable relationship between 
canopy stomatal conductance and GEP. 

In some forests, LUE is believed to decline before the onset of senes- 
cence owing to water stress”*. Some plants exhibit a decline in LUE 
as leaves age”’. In our forest, however, leaf-level gas-exchange meas- 
urements indicate little or no decline in photosynthetic capacity for 
our site-dominant tree species before senescence!”° (Fig. 3, red lines). 
Moreover, our observations show that the soil water content increased, 
and the atmospheric vapour pressure deficit decreased, from July to 
September (Extended Data Fig. 5): that is, water stress was decreasing 
throughout that period. Thus, LUE should not be expected to decline 
before senescence at our site, in accordance with our partitioning 
results. Indeed, given existing literature on the leaf-level Kok effect?! 
and on the seasonal patterns of aboveground respiration’? and leaf- 
level photosynthetic capacity at our site'>°, the seasonal patterns of 
DER and GEP derived from isotopic partitioning appear more con- 
sistent with present knowledge than do those obtained by standard 
partitioning. 

Our findings suggest a need to reappraise standard partitioning 
methods at other flux tower sites, in other ecosystems, to determine 
whether the patterns we have observed at the Harvard Forest are com- 
mon elsewhere. For example, in equatorial Amazonian forests, standard 
eddy-covariance-based estimates of GEP and LUE increase throughout 
the dry season. That increase contradicts some model predictions, but 
is supported by observations of increased leaf flushing at the start of 
the dry season*””®. Our findings, however, point to the possibility that 
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some of the apparent seasonality in tropical GEP might actually be 
a partitioning artefact introduced by not accounting for seasonality 
in (light-inhibited) leaf respiration, with young leaves respiring more 
(at night), as here. 

The empirical understanding of carbon exchange between tem- 
perate forests and the atmosphere that underpins efforts to estimate 
global biosphere productivity*® and to predict climate change** holds 
that ecosystem respiration is greater during the day than at night, and 
that the canopy photosynthetic light-use efficiency gradually declines 
throughout the growing season. But the strong apparent light-mediated 
inhibition of canopy respiration and the invariance of the canopy pho- 
tosynthetic light response found here challenge that understanding. 
These phenomena also highlight the central role of leaf-level physi- 
ological dynamics in ecosystem-scale responses to the environment, 
and the potential benefit of incorporating more accurate representation 
of such dynamics into Earth-system models. Our study suggests that 
ecosystem-scale isotopic flux measurements could provide a general 
basis for exploring how leaf-level dynamics play out in ecosystems of 
varying composition and phenology around the world. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data availability. The measurements used, as well as ancillary site data, are pub- 
licly available from the Harvard Forest Data Archive (http://harvardforest.fas. 
harvard.edu/harvard-forest-data-archive) under records HF209 and HF004. The 
measurements are also publicly available at ftp://saleskalab.eebweb.arizona.edu/, 
along with the analysis algorithms used. 

Site description. The measurements used here were acquired between May 2011 
and October 2013 at the Harvard Forest Environmental Measurements Site!*”?, 
which is situated in a mostly deciduous temperate forest dominated by red oak 
and red maple (with some hemlock and red pine) in Massachusetts, USA. Average 
seasonal cycles of key environmental and forest variables are shown in Extended 
Data Fig. 5. 

Measurements. Acquisition of isotopic fluxes (described previously’) was via a 
recently developed quantum cascade laser spectrometer that measures the isotopic 
composition of atmospheric CO, with unprecedented accuracy and precision. The 
long-term (>3-year) reproducibility in 100-s measurements by this spectrometer 
is + 0.1%0 for °C and + 0.12%o for §!8O (95% confidence interval) (Extended 
Data Fig. 1) The short term (<3-hour) precision in 100-s measurements—which 
is the relevant metric for distinguishing isotopic signatures in updrafts and down- 
drafts and, hence, for using the eddy covariance method—is + 0.04%o for BC 
and + 0.06%o for 5!8O (95% confidence interval). 

Environmental and forest variables used in this study (Extended Data Fig. 5) 
include: 

Plant area index. This was measured optically across a plot array in the flux tower 
footprint throughout the growing season (as part of routine site operations), inter- 
polated to our 40-minute time grid, and converted to leaf area index (LAI) as 
described elsewhere®. The period between the completion of leaf expansion and 
the onset of leaf abscission is defined as the time during which the LAI was greater 
than 95% of its stable summertime maximum. 

Absorbed photosynthetically active radiation (APAR). This is calculated as the 
difference between above-canopy PAR and below-canopy PAR (that is, neglecting 
reflection), the latter of which was calculated as the mean of six measurements 
(using Li-Cor LI-190S quantum sensors mounted 1 metre above the ground) 
distributed within the flux tower footprint. 

Soil temperature. This was calculated as the mean of measurements from eight 
copper-constantan thermocouples buried 10cm below the ground surface, 
distributed within the typical flux tower footprint. 

Volumetric soil moisture. This was measured for the depth range 0-30 cm at four 
locations in the typical flux tower footprint. 

Isotopic flux partitioning. Since its original exposition”, the isotopic partitioning 
of NEE has been tested in several ecosystems*’ ** but has not been generally 
adopted owing to instrumental limitations. Motivated by the improvements in field 
instrumentation described above, we recently extended the theory of isotopic flux 
partitioning (IFP), and successfully demonstrated its application at the Harvard 
Forest for the first year of our measurements, quantifying its uncertainties and 
potential biases®. Here we briefly recapitulate our method. 

The basic idea of IFP is to determine GEP and DER using their isotopic signa- 
tures and the magnitude and isotopic composition of their residual, NEE; in other 
words, to solve the following set of two mass balance equations for the two 
unknowns, DER and GEP, where 8\pps Spga> and 8Gpp describe the ratios 
(expressed relative to a standard material) of 13C to ?C in NEE, DER, and GEP, 
respectively: 


NEE = DER — GEP 
SNEENEE = 85pgDER — 8¢ypGEP 

Because the isotopic signature of GEP (that is, §¢pp) varies substantially during the 
day, additional equations based on leaf anatomy and biochemistry are used to 
express the isotopic signature of GEP in terms of GEP itself, chiefly via the 
canopy-integrated stomatal conductance, determined from heat and water fluxes 
measured by eddy covariance®, In our recently extended theory of isotopic 
partitioning’, the two equations above are replaced by a system of six equations in 
which DER is broken down into foliar and non-foliar respiration, and GEP is 
broken down into gross photosynthesis and photorespiration. The breakdown is 
necessary because each of these four processes has its own isotopic signature. 

The isotopic signature of non-foliar respiration is the only signature that 
is measured, by a combination of soil chamber and night-time Keeling plot 
measurements*. This method is weighted strongly to belowground respiration®. 
To the degree that the isotopic signature of non-foliar aboveground respiration 
differed from that of belowground respiration, our isotopic signature of overall 
non-foliar respiration would therefore have been in error. However, we showed 
previously that the partitioning is not very sensitive to that signature: a relatively 


large 1%o error in the signature of overall non-foliar respiration leads to an error 
of just 3% in GEP®. 

Although our IFP algorithm® treats foliar and non-foliar respiration separately 
because of their independent isotopic signatures, the rates of foliar and non-foliar 
respiration cannot be determined separately from one another with confidence by 
IFP (at least at present) because neither is directly measurable or predictably related 
to other variables—as, for example, photorespiration is related to photosynthesis 
by the photocompensation point (which is measurable) and DER is related to pho- 
tosynthesis and photorespiration by NEE (which is measurable). In other words, 
we have enough information to solve the equations for DER but not to apportion 
DER to foliar and non-foliar sources. The attribution of patterns in DER to patterns 
in foliar respiration in the main text is therefore based instead on comparison to 
belowground respiration measured by soil chambers. 

For the purposes of the calculation, we use a plausible a priori value for the 
rate of foliar respiration, and test how sensitive the GEP-versus-DER partitioning 
results are to this value. We find that the choice of a priori value, within plausible 
bounds, is inconsequential for the values of GEP and DER output by IFP (Extended 
Data Fig. 6), as was also found previously*. The partitioning of NEE into GEP and 
DER is thus robust even if IFP’s internal foliar/non-foliar apportioning is not. 
The plausible bounds for daytime foliar respiration are approximately: (1) zero, 
and (2) the total night-time aboveground respiration, which was determined by 
a multiyear synthesis of flux tower and soil chamber data in the typical flux tower 
sampling footprint at this site’. For the partitioning in the main text, we set the 
a priori value in the middle of the range; that is, for each daytime flux measurement, 
we calculated leaf respiration as 0.5 x p x NER, where NER is night-time ecosystem 
respiration (that is, night-time NEE) for the same wind direction, and p is the 
(seasonally varying) proportion of NER that is aboveground respiration according 
to the multiyear synthesis mentioned above. In this case, Extended Data Fig. 6 
shows that the maximum possible associated error in GEP at any time of the grow- 
ing season is 3.5%. Given the insensitivity of GEP and DER to the a priori rate of 
foliar respiration, and given that the consideration of foliar respiration in the main 
text is based on comparison of DER to other kinds of measurement rather than 
on IFP’s internal apportioning, the a priori rate of foliar respiration does not affect 
our findings concerning the inhibition of foliar respiration (or any other findings). 

With a measured signature of non-foliar respiration and an a priori rate of foliar 
respiration, the other six variables can be determined simultaneously by solution of 
the six equations. The equations neglect variation among the leaves within the can- 
opy with regards to physiology and microenvironment, a source of uncertainty in 
the method that—as we argue in the section “Uncertainty in the seasonal patterns’ 
(below) and in Extended Data Fig. 7—is unlikely to significantly affect our findings. 

Successful application of isotopic partitioning requires that the isotopic signa- 
tures of GEP and DER be distinct (otherwise the partitioning equations above are 
not independent). That is the case predominantly because of the large (4%o) diurnal 
variation in the signature of GEP and the small (<0.25%o) diurnal variation in the 
signature of DER. But we also see consistently distinct signatures even at seasonal 
timescales averaged over three years (Extended Data Fig. 2). 

A minor alteration to the previous method’, enabled by our larger data set here 
(three growing seasons instead of one), concerns the empirical function used to 
interpolate eddy-covariance-based estimates of stomatal conductance into periods 
with significant evaporation (for example, from the soil surface), which would 
otherwise contaminate the stomatal conductance estimation because it is a water 
flux that does not pass through the stomata. Previously’, only one year of data 
was used. Here the function was fit using all three years of available data, and the 
resulting set of fitted parameters was still able to closely reproduce the measured 
half-hourly water flux (7? = 0.92) in selected evaporation-free periods® that we used 
for testing the method. The strong ability to reproduce the water flux gives us high 
confidence that stomatal conductance values are well estimated during evaporative 
periods, and that IFP can therefore be extended to these times. 

Note that some parameters used for isotopic partitioning (for example, mes- 
ophyll conductance) might be inaccurate during periods of leaf expansion (that 
is, increase in leaf area following budburst) or autumnal senescence (that is, 
coloration and abscission), shown as hatched areas in Fig. 2 and Extended Data 
Figs. 2 and 5-10. 

Standard partitioning. For comparisons, we followed the standard partition- 
ing algorithms exactly as described'*"*, except that we used slightly longer time 
windows for building regressions in order to substantially reduce the number of 
regressions that were based on only a few data points (we used 6-day windows 
where 4-day windows were prescribed, and 15-day windows where either 12- or 
15-day windows were prescribed). All partitioning methods partitioned the same 
set of NEE measurements. 

Sampling biases in standard partitioning. There are two competing sampling 
biases in the standard partitioning method used in the main text!3, both of which 
are related to the fact that both night-time and daytime ecosystem respiration 


© 2016 Macmillan Publishers Limited. All rights reserved 


(the latter as determined by IFP) were more than two times higher in areas to the 
northwest of the flux measurement tower than they were to the southwest® (GEP 
from IFP was also higher to the northwest, by 28%). The Harvard Forest is not odd 
in this regard; directional variation in the flux measured by eddy covariance can 
result from true ecosystem heterogeneity or from correlation of environmental 
conditions with wind direction, caused by synoptic weather patterns. 

The first and larger bias arises because the standard method calculates DER 
from all night-time flux measurements in a 4- or 6-day time window; thus standard 
partitioning of a given flux measurement, corresponding to a given wind direction, 
is influenced by fluxes from many wind directions. This bias leads the standard 
method to overestimate DER in the southwest quadrant and underestimate it in 
the northwest quadrant; however, the bias is negligible if estimates of DER from all 
quadrants are averaged (as in Fig. 2). Restricting the standard partitioning method 
to use night-time fluxes from only one quadrant in its regressions would require 
expansion of the time window to almost a month in order to maintain sufficient 
points for each regression, which would raise concerns about seasonal variation 
biasing the method”. The isotopic method, on the other hand, partitions each flux 
measurement independently and thereby captures the variation of DER and GEP 
between different areas of the forest®. 

The second bias arises because natural wind patterns caused the night-time 
and daytime fluxes to be associated with different areas of the forest. In particular, 
in June and July (but not in August or September), the high-respiration areas to 
the northwest were sampled commonly during the daytime but rarely at night. 
This bias leads the standard method to underestimate DER in June and July but 
not in August or September. Following the discrepancy between night-time NEE 
and DER from IFP, the standard partitioning method should overestimate DER 
from IFP by about 100% in June-July. Instead, when estimates of DER from all 
quadrants are averaged to remove the first and larger bias (discussed above), 
this second bias reduces that overestimation to about 35%. If our forest were 
more homogeneous, the standard method would have overestimated DER by 
about 100%. 

We include all quadrants in the averages in Fig. 2 because it is focused on 
comparing the two partitioning methods, and the second bias is smaller than the 
first. On the other hand, we restrict Figs. 1 and 3 to the relatively homogeneous 
southwest quadrant because they are focused on the day-night difference in 
respiration and on the light-response of GEP, which are more accurately portrayed 
without the variability associated with sampling different quadrants. 

Standard partitioning adjusted for 100% inhibition of aboveground respiration by 
light. The dashed blue lines in Fig. 2 were obtained by scaling DER from standard 
partitioning by the seasonally varying ratio of belowground respiration to total 
ecosystem respiration in ref. 19, which was estimated from soil chambers and 
night-time NEE. 

Comparison with partitioning based on light-response curves. Regression-based 
partitioning incorporating a photosynthetic light-response function in addition to 
a respiratory temperature-response function", like the standard regression-based 
method discussed in the text!*, overestimated DER from IFP in July; however, 
it diverged from IFP and the more standard method at other times of the year 
(Extended Data Fig. 8). At this site, there is large horizontal variation in ecosystem 
respiration, so that changes in the flux tower sampling footprint (that is, in wind 
speed or direction) can cause the measured respiration to vary by a factor of 3 or 
more’. The dual-function method can only interpret such variation as being driven 
by PAR, sometimes leading to unrealistic light-response curves and erratic results®; 
examples are evident in August and October in Extended Data Fig. 8. Also like the 
standard method discussed in the text, dual-function partitioning yielded a light- 
use efficiency that showed no significant correlation with APAR on the seasonal 
scale (77 =0.01, P=0.85) (Extended Data Figs. 3 and 8), although it did correlate 
with APAR on shorter timescales (Extended Data Fig. 9). 

Uncertainty in the seasonal patterns. Because the estimated systematic 
uncertainty in IFP at this site exceeds 17% of GEP® (1 standard error), we 
examined the potential for errors in IFP to produce the observed seasonal 
pattern of disagreement with standard partitioning. The two largest sources of 
uncertainty in IFP—the only two that could account for a 10% error in GEP® 
in the period after leaf expansion and before senescence—are: (1) neglect of 
heterogeneity among the canopy leaves and their microenvironments, that 
is, the ‘big-leaf’ assumption, and (2) the prescribed isotopic fractionation by 
enzyme-catalysed fixation of CO, which is based on measurements in only a 
few non-forest species. 

Error owing to the big-leaf assumption would depend on the physical and 
physiological structure of the canopy and on the distribution of the light source 
(that is, the angle of the sun and the diffuse light fraction). Because the canopy 
structure does not vary appreciably between full leaf expansion and the onset of 
senescence, we can minimize any error owing to the big-leaf assumption during 
that period by restricting our analysis to data acquired under fully diffuse 
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light—that is, when the light source was distributed uniformly over the sky. We 
found that the seasonal pattern of the disagreement between IFP and stand- 
ard partitioning is not qualitatively changed by restricting the analysis to fully 
diffuse light conditions (diffuse light fraction >0.9), or by systematic error in the 
isotopic fractionation by enzyme-catalysed fixation of CO; (aside from a shift 
in the mean disagreement; Extended Data Fig. 7). Thus the seasonal pattern of 
disagreement between IFP and standard partitioning is unlikely to be an artefact 
of uncertainty in IFP. 

Random errors. The standard error bands in Figs. 2 and 3 are based on total 
variability within each averaging bin and therefore represent an upper bound 
on the effect of random measurement error. The random error in the isotopic 
partitioning of individual NEE measurements has previously been shown’ to be 
negligible compared with the random error in the eddy covariance measurement 
of NEE itself. 

Ogée et al.*° once argued, using Bayesian methods, that even with improved 
instrumentation, the random error in isotopic partitioning using '*C would remain 
so large that the partitioning would be useful only when averaging over many 
individual flux measurements. The rationale for the large error estimate was that 
the typical isotopic disequilibrium (~2%o) is not much larger than the random 
measurement error in the isotopic signatures of photosynthesis and respiration, 
and in the isotopic composition of NEE. However, Ogée et al.’s analysis did not 
account for correlations between the errors in these various isotopic quantities. 
Because the photosynthetic signature is determined by the partitioning equations 
(as opposed to being measured), random error in the photosynthetic signature 
derives entirely from, and is therefore correlated with, random error in the meas- 
ured variables (especially in the isotopic composition of NEE). That correlation 
reduces the measurement-derived random error in DER and GEP relative to Ogée 
et al.’s estimates®”, as confirmed by the actual variability in the retrieved gross fluxes 
not only here but also in all previous IFP studies*!~*’. That variability demonstrates 
that °C measurements are sufficiently precise to partition NEE measurements 
robustly on an hourly basis. 

The effect of error in the isotopic composition of NEE on error in the photo- 

synthetic signature using our equations was determined by sensitivity analysis 
and shown in Fig. 9 of ref. 8, as was the relative insensitivity of GEP to error in the 
isotopic composition of NEE and other measured variables. 
Comparison with other methods for estimating GEP. There are other methods 
being developed for estimating GEP, including those based on measurements 
of carbonyl sulfide (OCS), whose uptake by plants has been taken as an index 
of photosynthetic CO; assimilation”, and of sun-induced chlorophyll fluores- 
cence (SIF)*!, which is correlated with leaf photochemistry. Both OCS and SIF 
have been measured by others at or near this site for part of the period studied here 
(Extended Data Fig. 10). However, there are issues with both OCS and SIF that 
currently prevent these promising methods from estimating the seasonal pattern 
of GEP at the 5% uncertainty level needed to corroborate our present results. 
In the case of OCS, one issue is the occurrence of large unexplained emission 
of OCS at this site*? (Extended Data Fig. 10). Another is that the uptake of OCS 
depends on stomatal conductance and carbonic anhydrase content, not on 
photosynthesis directly*’. In the case of SIF, the SIF-GEP relationship depends 
on the degree to which light is limiting photosynthesis (which depends on plant 
stress and carboxylation capacity)” as well as on the proportion of sunlit and 
shaded leaves seen by the SIF sensor*!, which varies with sun angle, diffuse light 
fraction, and canopy structure. 
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Extended Data Figure 1 | Accuracy and precision of long-term 

isotopic measurements. Repeated quantum cascade laser spectrometer 
measurements (dots, each representing a measurement integrated 

over 100s) of the isotopic compositions of 53°C and 6'80 in a single 
known reference cylinder, but measured as if it were an unknown value 
interspersed among the three years of routine atmospheric measurements. 
Known reference cylinder values are indicated by the solid grey lines, with 


Year 
95% confidence intervals indicated by the grey regions. Except for a period 
in September 2011 (between the vertical orange lines) when an inferior 
instrument thermal regulation scheme was tested, the precision of the 
spectrometer’s rapid, in situ isotope measurements is seen to be better than 
that obtained for the reference cylinder by laboratory-based isotope ratio 
mass spectrometry’. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


3, (%o) 


e, (%o) 


&, OF dye (%o) 


D (%o) 


NS 
X\ T 


T T T 1 
May Jun Jul Aug Sep Oct 
2011-2013 Composite 


Extended Data Figure 2 | Composite seasonal cycles of isotopic 
compositions, isotopic discrimination and isotopic disequilibrium. 
Shown are: the isotopic composition of CO; in the canopy airspace, §n; 

the apparent fractionation by net photosynthetic assimilation (also 

called discrimination), ¢,; the isotopic signatures of net photosynthetic 
assimilation, 54, and non-foliar respiration, dy; and the isotopic 
disequilibrium, D = 5yr- Sa. Dark lines connect flux-weighted means over 
all daylight hours for each 12-day bin, except in the case of dyr, where 

the lines connect simple means over all night-time hours for each bin 
(because d5yp is derived from night-time Keeling plots rather than daytime 
flux measurements). Light shaded bands show standard errors in the flux- 
weighted means, calculated according to the ratio variance approximation 
recommended in ref. 43 (or just standard errors in the means for 6yr), and 
based on variability within each bin (64 <n < 431 for daylight bins, and 

16 <n < 33 for 5ypr). Hatched areas indicate periods of leaf expansion and 
abscission. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LUE (umol mE”) 


Oo 

26- 

254 4 A Qo 
24-0 S 
| een “Siew Kie wee wl. 
22-| : ast 
ae é A 
20-4 & a 


Oo 
T T T 
0.80 0.85 0.90 0.95 


24 
APAR (mEm s_) 

Isotopic Partitioning 

= 0.75; p = 0.003 

intercept = 33.4 + 2.5; slope = -12.8 + 2.8 
Partitioning from Ref. 13 

r = 0.07; p = 0.48 

interecept = 27.2 + 6.0; slope = -5.0 + 6.6 
Partitioning from Ref. 14 

r = 0.01; p = 0.82 

intercept = 24.5 + 9.6; slope = -2.5 + 10.7 


Extended Data Figure 3 | Relationships of LUE to APAR, from our 
isotopic partitioning and from both standard methods. Scatterplot 

of the LUE and APAR data from Fig. 2c (solid black circles), along with 
ordinary least-squares linear fits (black lines), for the period from full leaf 
expansion to the onset of senescence. These results are from partitioning 
based on isotopes. Also shown are results from the standard method of 
ref. 13 (hollow blue triangles), and from the partitioning method of ref. 14 
(hollow yellow squares). 
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Extended Data Figure 4 | Relationships of LUE to APAR within each 
month. Daily LUE is plotted against daily APAR, averaged by day of year 
across all three years, on the basis of isotopic partitioning (solid circles) 
and the standard method of ref. 13 (hollow squares), and plotted separately 
for June, July, August and September. Also shown are linear (ordinary 
least-squares) fits for the isotopic (solid line) and standard (dotted line) 
partitioning methods. 
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Extended Data Figure 5 | Composite seasonal cycles of environmental 
variables. Shown are leaf area index (LAI), leaf temperature (Leaf T), 
APAR, diffuse light fraction, leaf-air water vapour pressure difference 
(VPD,), canopy stomatal conductance (g,), volumetric soil water content 
(SWC), and soil temperature at 10 cm depth (Soil T), averaged across 

the three years, 2011-2013. Lines connect means over all daylight hours 
within each 12-day bin, and grey bands show standard errors in the means, 
calculated from variability within each bin (64 <n < 431). Air temperature 
(Air T), PAR, and the atmospheric water vapour pressure deficit (VPD) 
are also shown, as dotted lines. Hatched areas indicate leaf expansion and 
abscission. 
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Extended Data Figure 6 | Effect of varying the prescribed rate of foliar 
respiration on the seasonal patterns of GEP and DER. As for Fig. 2, 

but with the black lines thickened to show the range of GEP and DER 
values that result from prescribing between 0% and 100% inhibition of 
leaf respiration by light. The grey standard error bands in Fig. 2 have 

been removed here for clarity. Hatched areas indicate leaf expansion and 
abscission. a, GEP and DER. b, Discrepancy between standard and isotopic 
partitioning (black line), with the gold line showing the 1996-2009 mean 
seasonal pattern of aboveground respiration (Rapga) estimated from soil 
chambers and night-time NEE”. c, Light-use efficiency (LUE; isotopic and 
standard partitioning), with absorbed photosynthetically active radiation 
(APAR) inverted in red. d, Intrinsic water-use efficiency (WUE)). 
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Extended Data Figure 7 | Sensitivity of the seasonal cycles of GEP and 
DER to change in the isotopic fractionation by the photosynthetic 
enzyme Rubisco, and to restriction to diffuse light conditions. This 
figure compares the composite seasonal cycles (across the three years, 
2011-2013) of GEP and DER obtained from three variations of the 

IFP method (restricted to the southwest quadrant to reduce spurious 
discrepancies caused by differences in the flux tower sampling footprint 
when subsampling for diffuse light fraction). Top panel: GEP and DER 
from IFP. Bottom panel: the discrepancy between values of DER obtained 
from standard partitioning (based on night-time NEE and temperature), 
and values obtained from isotopic partitioning. The IFP variations shown 
are: as described in the text (solid black lines); restricted to periods with 
diffuse light fractions greater than 90% (solid orange lines); and using 
27%o instead of 29%o for the isotopic fractionation by Rubisco-catalysed 
fixation of CO, (dotted purple lines). The lines connect the means (which 
are from all daylight hours) for each 12-day bin. The light shaded bands 
around each line in the top panel show the standard error of the mean, 
calculated from the variability within each bin (25 <n < 130). Hatched 
areas indicate periods of leaf expansion and abscission. 
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Extended Data Figure 8 | Composite seasonal cycles, from isotopic 
partitioning and from both standard partitioning methods. Shown are 
results from isotopic partitioning (solid black); from standard partitioning 
based on night-time NEE and temperature (dotted blue); and from 
standard partitioning incorporating a photosynthetic function of light 
(dotted yellow). a, GEP and DER. b, LUE, with APAR inverted in red. 

c, WUE,. Lines connect means over all daylight hours for each 12-day bin; 
pale bands show standard errors of the means calculated from variability 
within each bin (64 <n < 431). 
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Extended Data Figure 9 | Seasonal cycles from isotopic partitioning and from both standard partitioning methods, for individual years. As for 
Extended Data Fig. 8, but showing the individual years separately (8 <n < 204). 
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2013 (dashed red). Lines connect means for each 12-day bin, and pale 
bands show standard errors of the means calculated from variability 
within each bin (10 << 209). The analysis included only data points for 
which simultaneous GEP and OCS, or GEP and SIF, measurements were 


available. The OCS data were provided by Commane et al.*°, and the SIF 
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Extended Data Figure 10 | Comparison of GEP values obtained 
from isotopic partitioning with preliminary estimates based on 
measurements of carbonyl sulfide and solar-induced fluorescence. 
Seasonal patterns of GEP from IFP (solid black) and from the standard 
method of ref. 13 (dotted blue) are compared with those of the OCS 
flux in 2011 (dashed purple, on an inverted scale) and the SIF signal in data by Yang et a 
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Interdisciplinary research has consistently lower 


funding success 


Lindell Bromham!, Russell Dinnage! & Xia Hua! 


Interdisciplinary research is widely considered a hothouse for 
innovation, and the only plausible approach to complex problems 
such as climate change’”. One barrier to interdisciplinary research 
is the widespread perception that interdisciplinary projects 
are less likely to be funded than those with a narrower focus**. 
However, this commonly held belief has been difficult to evaluate 
objectively, partly because of lack of a comparable, quantitative 
measure of degree of interdisciplinarity that can be applied to 
funding application data!. Here we compare the degree to which 
research proposals span disparate fields by using a biodiversity 
metric that captures the relative representation of different fields 
(balance) and their degree of difference (disparity). The Australian 
Research Council’s Discovery Programme provides an ideal test 
case, because a single annual nationwide competitive grants 
scheme covers fundamental research in all disciplines, including 
arts, humanities and sciences. Using data on all 18,476 proposals 
submitted to the scheme over 5 consecutive years, including 
successful and unsuccessful applications, we show that the greater 
the degree of interdisciplinarity, the lower the probability of being 
funded. The negative impact of interdisciplinarity is significant 
even when number of collaborators, primary research field and 
type of institution are taken into account. This is the first broad- 
scale quantitative assessment of success rates of interdisciplinary 
research proposals. The interdisciplinary distance metric allows 
efficient evaluation of trends in research funding, and could be used 
to identify proposals that require assessment strategies appropriate 
to interdisciplinary research’. 

The 5th Annual Meeting of the Global Research Council in New 
Delhi in May 2016 focused on interdisciplinarity as one of its main 
topics of concern, reflecting increasing interest in research that breaks 
free of traditional discipline boundaries, and the growing concern that 
interdisciplinary research is not adequately supported under current 
funding structures. Funding agencies play a key role in shaping inter- 
disciplinary research’, with both positive influence, such as dedicated 
programmes for interdisciplinary projects, and negative impacts, as 
perceived biases can discourage submission of interdisciplinary pro- 
posals to open funding calls. This leads to the ‘paradox of interdisci- 
plinarity’: interdisciplinary research is often encouraged at policy level 
but poorly rewarded by funding instruments’. There is a clear need to 
test the widely held belief that interdisciplinary proposals fare poorly in 
competitive funding rounds: confirmation could prompt examination 
of evaluation strategies for interdisciplinary projects, while rejection of 
this claim might encourage more interdisciplinary proposals. 

Critical to evaluation of current practice is the ability to compare 
levels of interdisciplinarity of research projects to track trends, evaluate 
outputs and compare success rates®’. Measures of interdisciplinarity 
have typically relied on textual references, detecting use of words such as 
‘interdisciplinarity’’, or bibliometric analysis, tracking patterns of author 
affiliation® or citations within publications“. But these approaches 
are limited in use for evaluating funding applications. Interpretation 
of the terms ‘multidisciplinary; ‘cross-disciplinary, ‘interdisciplinary’ 


and ‘transdisciplinary’ vary widely'', and researchers will differ 
in their inclination to label their research as ‘interdisciplinary”, 
particularly if they perceive that identification as interdisciplinary 
influences funding outcomes!*. Because bibliometric analyses are pri- 
marily applied to publications!!, they may be of limited applicability 
in assessing funding proposals, where the outputs have not yet been 
published, and citations may not be in an analysable format. The lack of 
clear definitions and objective analyses is an impediment to evaluating 
the relative success of interdisciplinary proposals'. Conflicting findings 
have been reported using a range of approaches’, and most studies 
of funding success of interdisciplinary research have selected only a 
sample of proposals for evaluation!>!®. What is needed is a measure 
of the degree to which a proposal spans many different disciplines, 
independent of use of words such as ‘interdisciplinarity’ and without 
relying on cited publication data. 

Although no single metric will capture all salient aspects of interdis- 
ciplinarity, developing a simple measure of the disciplinary spread of 
research proposals does provide a tractable way to compare the relative 
success of proposals having a narrow disciplinary focus with those with 
a broader research programme”. To this end, we use information sup- 
plied on funding applications to score each proposal on the disparity 
and balance of the component disciplines'”. We base our analysis on 
methods established in evolutionary biology to account for relatedness 
between biological lineages, but instead of using an evolutionary tree 
(phylogeny), we use a hierarchical classification of research fields. This 
metric can be applied to any funding scheme where multiple discipline 
categories can be selected by applicants or identified from proposal 
documents’. 

We calculated this interdisciplinary distance (IDD) metric for all 
proposals submitted to the Australian Research Council Discovery 
Programme between 2010 and 2014 (Supplementary Table 1). This 
national competitive grants scheme funds fundamental research in 
all academic fields, receiving approximately 3,500 proposals in each 
annual funding call, with success rates being around 15-20% of pro- 
posals (Extended Data Table 1). Our analysis is unique in including all 
submitted proposals, both successful and unsuccessful, whereas most 
analyses are restricted to the published lists of funded proposals”'® or 
to samples of case studies*. 

Every application must nominate at least one of a defined set of 
1,238 Field of Research codes, assigning a percentage weighting to 
each code selected. Field of Research codes are grouped into related 
disciplines: for example, the Division ‘06 Biological Sciences’ contains 
nine groups, including ‘0603 Evolutionary Biology, which contains 
12 Fields including ‘060309 Phylogeny and Comparative Analysis’ 
(Supplementary Fig. 1). Because we wish to capture the disciplinary 
breadth of proposals, we need to measure not only the number and rel- 
ative representation of disciplines selected, but also how disparate those 
research fields are. For example, we want to score a project that involves 
collaboration between biologists and artists as more interdisciplinary 
than one between biochemists and geneticists. Just as many biodiver- 
sity metrics use a phylogeny to measure disparity of species, research 
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Figure 1 | Relationship between funding success and IDD score. 

The central black line is the regression line (success = logit” '[—1.35506 4 
—0.40268 x IDD]); grey area represents the confidence intervals 

(see Supplementary Information for details). The horizontal lines around 
the regression line represent the mean success rate for each of 10 ‘bins’ of 
IDD values of width 0.1. Each horizontal line’s y value represents the mean 
success rate of proposals whose IDD values fall within a ‘bin’ defined by 
the ends of the horizontal line. The bars at the top and bottom of the 
figure indicate the number of proposals for each IDD score, with darker 
lines corresponding to more proposals. Fitted line based on a generalized 
linear mixed model as described in the Supplementary Information 

(z= —6.789; P=1.13 x 107"; n= 18,476). 


fields can be arranged as a dendrogram, where fields of enquiry that 
are more similar are more closely connected to each other than more 
distant fields. Our approach is not restricted to funding programmes 
that use hierarchically structured research codes, but can be applied 
to any research field identifiers by using patterns of co-occurrence to 
define clusters of similar fields’. 

We use the phylogenetic species evenness’? metric to measure IDD. 
This metric was designed to compare biodiversity between samples (for 
example, between conservation areas), incorporating both evenness of 
species representation in the biota and relatedness between species””. 
IDD reflects both the relative contribution of disciplines within a pro- 
posal (balance) and scores collaborations between distantly related 
fields more highly than those between more closely related disciplines 
(disparity)*!. The metric is standardized so it falls between 0 (single 
disciplinary) and 1 (maximum disparity with even representation), 
allowing direct comparison between proposals (Extended Data Fig. 1). 
Patterns of co-occurrence of field identifiers (for example, research 
codes or key words) can be used to generate a hierarchy of discipline 
relationships (see Supplementary Information). Basing this hier- 
archy on observed patterns of collaboration would rank proposals 
with respect to the relative novelty of the disciplinary combinations 
proposed”. 

The Australian Research Council provided de-identified data on 
all applications to the Discovery Programme for five annual funding 
rounds. We used a generalized linear mixed model to ask whether the 
IDD score of grant proposals is associated with funding success. We 
included as variables in the analysis the year of application, number of 
Field of Research codes selected per proposal, number of named chief 
investigators and institution (grouped into higher education networks: 


LETTER 


Extended Data Table 2). The response variable was a binary vector 
with two states: recommended for funding (1) or not recommended 
for funding (0). We provide details of the analysis and results in the 
Supplementary Information. 

We find that IDD is consistently negatively correlated with funding 
success (slope = —0.40, P=1.1 x 10"; Fig. 1), independent of year of 
application, number of research codes selected and primary research 
field. Nearly all research fields have reduced funding success with 
increasing interdisciplinarity (Fig. 2 and Extended Data Table 3). If 
the association between IDD and funding success was largely a matter 
of averaging success rates of the component disciplines—so that less 
successful fields benefit from collaboration with more successful fields 
but more successful fields have their success rates reduced through 
collaboration—then we would expect many points both above and 
below y =0 in Fig. 2. However, most fields have negative values of y, 
suggesting that proposals with high IDD are expected to have lower 
success rates than those with low IDD in most research divisions. 

We conducted additional analyses using metrics that reflected only 
variety (number of codes) or balance (evenness) of disciplines (details 
in Supplementary Information), which demonstrated that it is both 
disparity and balance between disciplines that influence chance of 
funding success, justifying the use of the IDD metric which captures 
both of these aspects of interdisciplinarity. We also searched for relevant 
keywords in proposal titles and summaries, such as ‘interdisciplinary, 
‘multidisciplinary, ‘transdisciplinary’ and ‘cross-disciplinary. We found 
that proposals with these keywords also had higher IDD measures and 
lower success rates, demonstrating that the metric-based approach ech- 
oes text-based analysis, yet with greater power to detect differences in 
funding rates (see Supplementary Information). 

IDD reflects the interdisciplinarity of the project, not the inclusion of 
practitioners from different disciplines, because a single researcher can 
devise a project that spans different disciplinary traditions''. To test the 
influence of collaboration, we added the number of named chief inves- 
tigators to the analysis. We find that proposals with more chief investi- 
gators have slightly higher success rates (slope =0.03, P=0.003), across 
all research fields, independently of any link between number of collab- 
orators and interdisciplinarity (Supplementary Table 2). Although the 
relationship between number of chief investigators and IDD is positive, 
the effect is small (Spearman’s p= 0.09), suggesting that the number 
of participants is not strongly associated with the interdisciplinarity of 
the project proposal. 

Because of the perceived negative association between funding 
success and interdisciplinarity, interdisciplinary projects are often 
regarded as high-risk proposals*. Do institutions with higher rates 
of funding success submit more interdisciplinary proposals (using 
higher success rates to support risky proposals) or fewer (because 
higher success rates arise from narrowly focused research)? We find 
that overall funding success rates varied between institutions, with 
significantly higher funding success rates in leading research-intensive 
universities (Extended Data Table 2)!*. Differences in IDD between 
institutions were very small (R? =0.001) and the negative relationship 
between IDD and success rate was significant when institution was 
taken into account (slope = —0.39, P=7.6 x 10~"'; Supplementary 
Table 2). This suggests that the negative relationship between inter- 
disciplinarity and funding success is not due to institutions with 
high funding success submitting more narrowly focused proposals 
(Extended Data Fig. 2). 

Why do interdisciplinary proposals have lower funding success rates? 
It is widely believed that grant evaluation processes are biased against 
interdisciplinary projects, because proposals may be assigned to a panel 
or reviewers who are ill-equipped to evaluate all parts of the project”, 
while more narrowly focused proposals may be better matched to 
assessor expertise*. Proposals that fit within a well-defined discipline 
may be more easily explained and justified, whereas the novelty of 
combinations of different perspectives may be more difficult to 
explain” or result in less-focused proposals‘. 
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Figure 2 | Relationship between interdisciplinarity and funding success 
by research division. The x axis gives the success rate of a research 
division as the proportion of successfully funded proposals in that research 
division. The error bars along the x axis show the confidence interval of 
success rate approximated by Wilson interval. The number of proposals in 
each research division is given in Extended Data Table 1. The y axis gives 
the average predicted difference in logit success rates between proposals 
with maximum interdisciplinarity (IDD = 1) compared with proposals 


While interdisciplinary research can have considerable benefits, 
it can also incur substantial costs, owing to the need to invest significant 
time in building collaborative relationships, developing a shared lan- 
guage and honing a common perspective from disparate viewpoints”. 
The outputs of interdisciplinary projects may be fewer and of differ- 
ent kinds to projects with a narrower disciplinary focus”®’’. Research 
evaluation systems with a narrow range of measures of success—for 
example, number of primary research publications in peer-reviewed 
journals—may disadvantage interdisciplinary proposals where some 
key outputs are less easily measured, such as the establishment of 
collaborative networks or data-sharing agreements”*. While some inter- 
disciplinary studies produce significant advances, the average quality 
of interdisciplinary proposals may not be the same as more narrowly 
focused research. Studies of the long-term scholarly impact of interdis- 
ciplinary research have had mixed results: whereas some suggest greater 
benefits, others find no support for higher impact of interdisciplinary 
research!?"4, 

Whatever the cause of the correlation, our result confirms the long- 
held belief that interdisciplinary proposals have lower funding success 
rates, providing the basis for further investigation into the development 
and evaluation of interdisciplinary research. Although IDD does not 
capture all key aspects of interdisciplinary research, it does provide a 
tractable and adaptable way of comparing interdisciplinarity between 
proposals and tracking trends in application rates and funding success. 
IDD can be applied to any funding programme where research fields 
are identified. Relatedness between disciplines can be defined a priori 
(for example, Field of Research codes), through clustering analysis 
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with a single primary Field of Research code (IDD = 0) for each division, 
so that y= —0.5 indicates that the logit success rates of proposals with 
IDD = 1 is 0.5 lower than the logit success rate of proposals with IDD = 0. 
The standard error of the predicted difference (the error bar along the 

y axis) is the square root of the sum of the squared standard error of 

the IDD coefficient and that of the interaction coefficient. The average 
difference of each research division and its error bar are predicted by the 
generalized linear mixed model in Extended Data Table 3. 


of previous applications (see, for example, Extended Data Table 1), 
subjectively (based on experience) or by any other relevant means’. 
Such analyses will bring much needed clarity to determining whether 
interdisciplinary research programmes are being adequately supported 
under current funding models. In addition to enabling assessment of 
biases in success rates, the IDD metric could provide a way of iden- 
tifying highly interdisciplinary proposals that might require special 
evaluation strategies, such as seeking reviewers who have experience 
in research spanning multiple fields. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 10 December 2015; accepted 11 May 2016. 


1. Rylance, R. Grant giving: global funders to focus on interdisciplinarity. 
Nature 525, 313-315 (2015). 

2. Ledford, H. How to solve the world’s biggest problems. Nature 525, 308-311 
(2015). 

3. Lyall, C., Bruce, A, Marsden, W. & Meagher, L. The role of funding agencies in 
creating interdisciplinary knowledge. Sci. Public Policy 40, 62-71 (2013). 

4. Woelert, P. & Millar, V. The “paradox of interdisciplinarity” in Australian 
research governance. High. Educ. 66, 755-767 (2013). 

5. Committee on Facilitating Interdisciplinary Research. Facilitating 
Interdisciplinary Research (National Academy of Sciences, National Academy of 
Engineering, Institute of Medicine, 2004). 

6. Langfeldt, L. The policy challenges of peer review: managing bias, conflict 
of interests and interdisciplinary assessments. Res. Eval. 15, 31-41 
(2006). 

7. Nichols, L. G. A topic model approach to measuring interdisciplinarity at the 
National Science Foundation. Scientometrics 100, 741-754 (2014). 


© 2016 Macmillan Publishers Limited. All rights reserved 


21: 
22. 


Van Noorden, R. Interdisciplinary research by the numbers. Nature 525, 
306-307 (2015). 

Porter, A. & Rafols, |. Is science becoming more interdisciplinary? Measuring 
and mapping six research fields over time. Scientometrics 81, 719-745 
(2009). 


. Porter, A. L., Roessner, J. D., Cohen, A. S. & Perreault, M. Interdisciplinary 


research: meaning, metrics and nurture. Res. Eval. 15, 187-195 (2006). 


. Wagner, C. S. et al. Approaches to understanding and measuring 


interdisciplinary scientific research (IDR): a review of the literature. 
J. Informetrics 5, 14-26 (2011). 


. Yegros-Yegros, A., Rafols, |. & D’Este, P. Does interdisciplinary research lead to 


higher citation impact? The different effect of proximal and distal 
interdisciplinarity. PLoS ONE 10, e0135095 (2015). 


. Wang, J., Thijs, B. & Glanzel, W. Interdisciplinarity and impact: distinct effects 


of variety, balance, and disparity. PLoS ONE 10, e€0127298 (2015). 


. Shi, X. Adamic, L. A., Tseng, B. L. & Clarkson, G. S. The impact of boundary 


spanning scholarly publications and patents. PLoS ONE 4, e6547 (2009). 


. Huutoniemi, K., Klein, J. T., Bruun, H. & Hukkinen, J. Analyzing 


interdisciplinarity: typology and indicators. Res. Policy 39, 79-88 (2010). 


. Bruun, H., Hukkinen, J., Huutoniemi, K. & Klein, J. T. Promoting Interdisciplinary 


Research: The Case of the Academy of Finland (The Academy of Finland, 2005). 


. Bammer, G. Strengthening Interdisciplinary Research: What It Is, What It Does, 


How It Does It and How It Is Supported (Australian Council of Learned 
Academies, 2012). 


. Ma, A., Mondragén, R. J. & Latora, V. Anatomy of funded research in science. 


Proc. Nat! Acad. Sci. USA 112, 14760-14765 (2015). 


. Helmus, M. R., Bland, T. J., Williams, C. K. & Ives, A. R. Phylogenetic measures of 


biodiversity. Am. Nat. 169, E68-E83 (2007). 


. Cadotte, M. W. et al. Phylogenetic diversity metrics for ecological communities: 


integrating species richness, abundance and evolutionary history. Ecol. Lett. 
13, 96-105 (2010). 

Stirling, A. A general framework for analysing diversity in science, technology 
and society. J. R. Soc. Interface 4, 707-719 (2007). 

Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and 
scientific impact. Science 342, 468-472 (2013). 


LETTER 


23. Porter, A. L., Garner, J. & Crowl, T. Research coordination networks: evidence of 
the relationship between funded interdisciplinary networking and scholarly 
impact. Bioscience 62, 282-288 (2012). 

24. Boix Mansilla, V., Feller, |. & Gardner, H. Quality assessment in interdisciplinary 
research and education. Res. Eval. 15, 69-74 (2006). 

25. Haythornthwaite, C., Lunsford, K. J., Bowker, G. C. & Bruce, B. C. in New 
Infrastructures for Science Knowledge Production (ed. Hine, C.) 143-166 
(Idea Group, 2006). 

26. Laudel, G. Conclave in the Tower of Babel: how peers review interdisciplinary 
research proposals. Res. Eval. 15, 57-68 (2006). 

27. Goring, S. J. et al. Improving the culture of interdisciplinary collaboration in 
ecology by expanding measures of success. Front. Ecol. Environ 12, 39-47 (2014). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank the Australian Research Council for providing 
de-identified application data for analysis, and for their commitment to 
transparency and improvement of research proposal assessment. We are 
grateful to A. Byrne for his feedback and encouragement. We also thank 

M. Jennions for feedback, and G. Bammer, J. Bennett and the participants of the 
workshop on Interdisciplinary Research: Evaluating and Rewarding High-Quality 
Projects held at the University of New South Wales in August 2015. 


Author Contributions All authors contributed equally to this work. L.B. 
conceived the project and wrote the paper; R.D. and X.H. designed, conducted 
and interpreted the analyses. 


Author Information Reprints and permissions information is available 

at www.nature.com/reprints. The authors declare no competing financial 
interests. Readers are welcome to comment on the online version of the 
paper. Correspondence and requests for materials should be addressed to 
L.B. (lindell.bromham@anu.edu.au). 


Reviewer Information Nature thanks L. Amaral, M. Helmus and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


30 JUNE 2016 | VOL 534 | NATURE | 687 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


A) Observed distribution of IDD B) Null distribution of IDD 
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Extended Data Figure 1 | Comparison of observed distribution of IDD generated by random sampling of Field of Research codes conditional 
scores to a null distribution. a, Distribution of IDD scores for 18,476 on the observed frequencies of number of selected codes and percentage 
proposals to the Australian Research Council Discovery Programme, allocations. 


pooled over 5 years (2010-2014). b, Null distribution of IDD scores 
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networks. The research-intensive Group of Eight (Go8) universities submit _interdisciplinarity scores and success rates are similar across institutions. 
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Extended Data Table 1 | Summary of proposals submitted to Australian Research Council Discovery Programme between 2010 and 2014 


Domain Divisions Number of % Median Median 
Proposals Success IDD codes 
01 Information 01 Mathematical Sciences 893 27 0.54 3 
08 Information And Computing Sciences 1351 17 0.54 2 
02 Matter 02 Physical Sciences 1003 21 0.51 3 
03 Chemical Sciences 1172 22 0.63 3 
09 Engineering 2823 18 0.56 3 
10 Technology 582 20 0.80 3 
03 Environment 04 Earth Sciences 706 22 0.54 3 
05 Environmental Sciences 418 17 0.75 3 
04 Life 06 Biological Sciences 2458 23 0.56 3 
07 Agricultural And Veterinary Sciences 181 12 0.77 3 
11 Medical And Health Sciences 905 18 0.65 3 
05 Interaction  12BuiltEnvironmentAndDesign = «|= 265 £10 £4051 #=2 
14 Economics 424 25 0.48 3 
15 Commerce, Management, Tourism & Services 582 13 0.33 2 
O6Mind = = 13Education — i ti(i‘;;!!!OC(«“NCl lhl lO UCUlU 
17 Psychology And Cognitive Sciences 896 26 0.32 2 
19 Studies In Creative Arts And Writing 269 17 0.64 2 
07 Society === 16 StudiesInHumanSociety = «= 1161 #42121 42.056 3 
18 Law And Legal Studies 368 22 0.32 3 
20 Language, Communication And Culture 722 21 0.55 3 
21 History And Archaeology 534 28 0.33 2 
22 Philosophy And Religious Studies 329 19 0.50 3 


The number of proposals with a primary identification to each Division in 5 years of pooled applications to the Australian Research Council Discovery Programme, with the percentage recommended 
for funding (% success), median interdisciplinary distance (IDD) and median number of six-digit FOR codes selected per application (Median codes). Divisions (first two digits of the FOR codes) have 
been clustered into Domains, as described in the Supplementary Information. Note that medical research is predominantly funded through a different scheme, as is research in collaboration with 
industry partners. 
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Network Institutions Proposals Success’ IDD 
Group of Eight Go8s University of Melbourne 10974 24% 0.56 
Australian National University 
University of Sydney 
University of Queensland 
University of Western Australia 
University of Adelaide 
Monash University 
University of New South Wales 
Innovative IRU Charles Darwin University 1486 15% 0.57 
Research James Cook University 
Universities Murdoch University 
Flinders University 
LaTrobe University 
Griffith University. 
Australian ATN Queensland University of Technology 2181 14% 0.57 
Technology University of Technology Sydney 
Network RMIT University 
University of South Australia 
Curtin University. 
Regional RUN Central Queensland University 324 10% 0.56 
Universities Federation University Australia 
Network Southern Cross University 
University of New England 
University of Southern Queensland 
University of the Sunshine Coast 
Unaligned 3476 17% 0.56 


Australian higher education institutions grouped by network, with the number proposals submitted to the Australian Research Council Discovery Programme over 5 years (2010-2014), the average 


percentage success rate (percentages that were recommended for funding) and the median IDD score of all proposals. 
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Extended Data Table 3 | Effect size of interdisciplinarity on funding success in each division 


Mean Cohen's D Variance Cohen's D Wtd 
Division Coeff Intn YI Y2 Y3 Y4 Y5 Yl Y2 Y3 Y4 Y5_ avg 
Mathematical Sciences 0 0 -0.24 -0.21 -0.26 -0.03 -0.29 0.16 0.16 0.18 0.17 0.18 -0.20 
Physical Sciences -0.37"** 0.00 9.02 -0.32 -0.81 0.03 -0.16 0.15 0.20 0.19 0.18 0.18 -0.23 
Chemical Sciences -0.25* 042 910 0.12 -0.27 -0.15 0.11 0.14 0.16 0.16 0.16 0.18 -0.06 
Earth Sciences -0.25*  0.86* 907 -0.27 -0.01 0.13 0.01 0.18 0.21 0.21 0.21 0.23 -0.04 
Environmental Sciences -0.48"* 0.00 9.22 -0.19 0.10 -0.34 -0.10 0.25 0.31 0.34 0.29 0.30 -0.16 
Biological Sciences -0.20* 037 9.09 -0.17 -0.03 0.08 -0.16 0.09 0.11 0.11 0.12 0.11 -0.08 
Agricultural & Veterinary Sciences = !-92*** = 1.04 14 0.21 -0.24 -0.04 -0.26 0.75 0.63 0.44 0.55 0.45 -0.07 
Information & Computing -0.60*** 0.08 
Sciences 0.24 -0.01 -0.18 -0.39 -0.04 0.15 0.18 0.15 0.16 0.17 -0.18 
Engineering -0.50"** 0.33 9.93 -0.03 -0.11 -0.08 -0.13 0.10 0.11 0.11 O11 O11 -0.11 
Technology 0.35" 0.28 O11 -0.10 -0.41 -0.05 -0.20 0.21 0.24 0.25 0.28 0.22 -0.12 
Medical & Health Sciences -0.50"** 0.57 901 0.15 -0.13 -0.41 0.04 0.16 0.17 0.22 0.22 0.24 -0.06 
Built Environment & Design -1.17*** 0.86 9.16 0.37 0.03 -0.67 -0.42 0.37 0.45 0.74 0.73 0.39 -0.07 
Education “0.59"* 0.58 9.25 -0.13 -0.34 0.05 -0.18 0.26 0.31 0.29 0.32 0.28 -0.07 
Economics 0.17 -0.07 9.23 -0.07 -0.05 -0.26 -0.24 0.27 0.24 0.29 0.22 0.28 -0.17 
Commerce, Management, -0.90*** 0.61 
Tourism & Services 0.21 -0.27 0.19 -0.01 0.65 0.24 0.24 0.30 0.30 0.43 0.01 
Studies In Human Society -0.31** 0.09 9.32 -0.05 -0.03 -0.10 -0.36 0.16 0.17 0.16 0.16 0.17 -0.17 
Psychology & Cognitive Sciences O15 013° 9.41 -0.06 0.06 -0.21 -0.35 0.17 0.17 0.17 0.18 0.18 -0.19 
Law & Legal Studies -0.26+ = 0.76 9.22 (0.30 --0.57 -0.07 -0.15 0.27 0.28 0.28 0.29 0.35 -0.05 
Studies In Creative Arts & Writing  -9-0*** 0.56 9.46 034 0.70 -032 0.37 0.38 0.36 0.38 0.39 0.37 -0.01 
Language, Communication & -0.35** 0.41 
Culture 0.00 -0.18 -0.03 -0.13 -0.10 0.17 0.21 0.21 0.22 0.24 -0.08 
History & Archaeology 0.03 -0.08 937 0.09 -0.51 -0.18 -0.10 0.19 0.21 0.24 0.25 0.23 -0.25 
Philosophy & Religious Studies -0.48** 0.30 911 -0.20 -0.11 -0.31 0.21 0.27 0.31 0.31 0.39 0.37 -0.11 


The generalized linear mixed model used to predict success rate has IDD, division and their interaction as fixed variables and year as a random variable. ‘Mathematical Sciences’ is used as the 
reference category for other divisions, so its coefficient and interaction are zeros. Coefficients show differences in success rates of other divisions compared with Mathematical Sciences. Interaction 
shows differences in the effect of interdisciplinarity on success rates of other divisions compared with Mathematical Sciences. The coefficient of IDD = —0.68**. A likelihood ratio test suggests that 
including IDD as a fixed variable significantly increases model fit to the data (y222=57.94, P=4.5 x 1075), ***P=0; **P=0.001; *P=0.01; *P=0.05. Coeff, coefficient; Int’n, interaction; Y, year; 
Wtd avg, weighted average. 
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Basal forebrain projections to the lateral habenula 
modulate aggression reward 


Sam A. Golden!?, Mitra Heshmati!?*, Meghan Flanigan!*, Daniel J. Christoffel?, Kevin Guise!*, Madeline L. Pfau'”, 
Hossein Aleyasin', Caroline Menard!, Hongxing Zhang“, Georgia E. Hodes!, Dana Bregman!, Lena Khibnik', Jonathan Tail, 
Nicole Rebusi!, Brian Krawitz!?, Dipesh Chaudhury’, Jessica J. Walsh?, Ming-Hu Han!4, Matt L. Shapiro! & Scott J. Russo! 


Maladaptive aggressive behaviour is associated with a number 
of neuropsychiatric disorders! and is thought to result partly 
from the inappropriate activation of brain reward systems in 
response to aggressive or violent social stimuli”. Nuclei within 
the ventromedial hypothalamus**, extended amygdala® and 
limbic’ circuits are known to encode initiation of aggression; 
however, little is known about the neural mechanisms that directly 
modulate the motivational component of aggressive behaviour®. 
Here we established a mouse model to measure the valence of 
aggressive inter-male social interaction with a smaller subordinate 
intruder as reinforcement for the development of conditioned 
place preference (CPP). Aggressors develop a CPP, whereas non- 
aggressors develop a conditioned place aversion to the intruder- 
paired context. Furthermore, we identify a functional GABAergic 
projection from the basal forebrain (BF) to the lateral habenula 
(1Hb) that bi-directionally controls the valence of aggressive 
interactions. Circuit-specific silencing of GABAergic BF-IHb 
terminals of aggressors with halorhodopsin (NpHR3.0) increases 
1Hb neuronal firing and abolishes CPP to the intruder-paired 
context. Activation of GABAergic BF-lHb terminals of non- 
aggressors with channelrhodopsin (ChR2) decreases IHb neuronal 
firing and promotes CPP to the intruder-paired context. Finally, 
we show that altering inhibitory transmission at BF-IHb terminals 
does not control the initiation of aggressive behaviour. These results 
demonstrate that the BF-IHb circuit has a critical role in regulating 
the valence of inter-male aggressive behaviour and provide novel 
mechanistic insight into the neural circuits modulating aggression 
reward processing. 

To study individual differences in aggression, we adapted the 
sensory contact model of social defeat for CD-1 mice? !!, which exhibit 
a wide spectrum of aggressive behaviours. In this procedure (Fig. 1a), 
a sexually experienced adult male CD-1 mouse is presented with a 
series of novel 6-8-week-old subordinate male C57BL/6] intruder mice, 
who do not themselves exhibit any aggressive behaviours towards CD-1 
mice (Extended Data Fig. 1a-i). This procedure identifies individual 
differences in antagonist aggressive behaviours without producing 
lasting stress-related behavioural phenotypes (Extended Data Table 1). 
Ethological analysis revealed that approximately 70% (310/448) of mice 
exhibited aggressive behaviour (termed aggressors (AGGs)) during 
at least one session, while approximately 30% (138/448) failed to 
initiate aggressive behaviour (termed non-aggressors (NONs)) at any 
time (Fig. 1b). 

After repeated intruder interactions, AGGs have elevated serum 
testosterone (Fig. 1c) and decreased corticosterone (Fig. 1d) levels 
relative to NONs, suggesting that NONs may be less dominant and 
experience forced intruder interactions as more stressful. Analysis 
of several common metrics for aggression revealed normalized 


distributions across AGGs that increased between screening sessions 
(Fig. le, fand Extended Data Fig. 2a—g). Importantly, the mean number 
of attack bouts (Extended Data Fig. 2f) and mean duration of attack 
bouts (Extended Data Fig. 2g) significantly correlate to mean attack 
latency. Therefore, attack latency provides a reliable index of aggression 
behaviours. Subsequently, we focused on AGGs that exhibited attack 
latencies within the most aggressive quartile of the sample distribution. 
These data confirm that outbred CD-1 mice exhibit a wide spectrum 
of aggressive behaviour and physiological responses to an intruder, 
leading us to hypothesize that there may be differences in the valence 
of intruder interactions among AGGs and NONs. 

To assay the motivational state associated with intruder pairings, 
we developed an aggression-based CPP procedure. In this model, 
CD-1 mice are screened for aggression phenotype and then condi- 
tioned for CPP (Fig. 1g) by receiving novel C57BL/6) intruder-paired 
or intruder-unpaired sessions twice a day for three days. AGGs showa 
CPP for the intruder-paired context, while NONs show a conditioned 
place aversion (CPA) (Fig. lh-j and Extended Data Fig. 3a—d). CPA 
in NONs does not appear to result from baseline differences in mood 
and anxiety or lack of interest in social targets (Extended Data Tables 1 
and 2). However, we found that the valence of intruder interactions in 
AGGs and NONS is dependent upon intruder mice being freely moving 
and physically accessible during conditioning. Using a sensory CPP 
procedure in which the intruder mouse is placed in a protective cage 
within the intruder-paired context, both CPP and CPA are abolished 
(Fig. 1k-n and Extended Data Fig. 3e-h). These data demonstrate 
individual differences in the positive or negative valence of intruder 
interactions in AGGs versus NONs. 

Clinical”? and preclinical® studies have implicated BF structures, 
such as the nucleus accumbens (NAc), lateral septum and diagonal 
band nuclei (DBN), as potentially important brain regions controlling 
aggression-related behaviours. However, there has been limited 
functional evidence that the BE, or its projections, directly modulate 
the rewarding aspects of aggression. To define BF projections, we 
injected an adeno-associated virus (AAV) vector expressing enhanced 
yellow fluorescent protein (eYFP) under a neuronal-specific human 
synapsin (hSyn) promoter (AAV2-hSyn-eYFP) into the BF of CD-1 mice 
(Fig. 2a-c, top, and Extended Data Fig. 4a-c) targeted specifically to the 
more anterior septo-accumbal transition zone of the basal forebrain!? 
and observe a prominent axonal projection to the [Hb (Fig. 2b, top). 

To characterize BF-lHb projections further, we injected the IHb 
(Fig. 2a, b, bottom) with a retrograde monosynaptic glycoprotein-dead 
rabies virus (G-deleted-rabies-eGFP)!*. Within the anterior BF that 
overlaps with our anterograde viral infection, we observed retrograde 
labelling in the septum (~45%), DBN (~35%) and the medial NAc shell 
(~15%) (Fig. 2c, d, bottom). Within retrogradely labelled BF slices, we 
performed in situ hybridization for GAD67, a marker of inhibitory 


lFishberg Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA. ?Graduate Program in Neuroscience, Icahn 
School of Medicine at Mount Sinai, New York, New York 10029, USA. ?Department of Psychiatry and Behavioral Sciences, Stanford University Medical Center, Palo Alto, California 94305, USA. 
4Pharmacology and Systems Therapeutics and Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA. 
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Figure 1 | Individual differences in aggression-related reward 
behaviour. a, Aggression screening: experimental schematic. b, Percentage 
of mice exhibiting aggressive (AGG) versus non-aggressive (NON) 
behaviours. c, d, Serum testosterone (t)6 = 2.23, *P < 0.05; two-tailed 
unpaired t-test, n =9 per group) (c) and corticosterone (ty) = 3.231, 

**P < 0.01; two-tailed unpaired t-test, n = 10-11 per group) (d). 

e, f, Mean latency to attack (e) (F2,1333 = 49.37, two-way analysis of 
variance (ANOVA) P< 0.001; post-hoc test, ***P < 0.001; n = 138-310) 
and attack duration (f) (F2,1333 = 22.35, two-way ANOVA P< 0.001; 
post-hoc test, ***P < 0.001; m = 138-310). g, Aggression CPP schematic. 


GABAergic neurons, and observed colocalization within the septum 
(~75%), DBN (~80%) and medial NAc shell (100%) (Fig. 2e). 

To identify whether BF and IHb neurons are differentially activated 
by intruder interactions in AGGs and NONs, we examined c-Fos 
immunoreactivity 1h after the final intruder screening. AGGs exhibit 
elevated c-Fos immunoreactivity in the septo-accumbal transition zone 
of the BF relative to NONs (Fig. 2f, g). Within the eYFP-positive BF 
terminal fields in the medial 1Hb, NONs exhibit increased c-Fos 
immunoreactive nuclei relative to AGGs (Fig. 2f, g). This finding was 
corroborated by slice electrophysiology, in which NONs exhibit an 
increase in lHb firing rates compared with AGGs 1h after an intruder 
interaction that returns to baseline by 7 days after intruder interaction 
(Fig. 2h, i). Together, these data show that 1Hb neurons are differentially 
regulated by intruder interactions, possibly through inhibitory BF inputs. 

To determine the functional contribution of BF-lHb projections, 
we conducted optogenetic circuit-specific terminal photostimulation 
in combination with slice electrophysiology with channelrhodopsin 
(AAV2-hSyn-ChR2(H134R)-eYFP) or halorhodopsin (AAV2-hSyn- 
NpHR3.0-eYFP), identifying photostimulation parameters that pro- 
duce robust transient lHb activation or inhibition without rebound 
neuronal firing. ChR2?*~!"> terminal photostimulation with 40 Hz 
resulted in significantly decreased Hb firing rates (Fig. 2j, k), while 
NpHR3®!~#? (85 on, 2s off) terminal photostimulation resulted in a 
robust increase in postsynaptic IHb firing rates (Fig. 21, m). Importantly, 
whole-cell recordings from IHb neurons during ChR2°*~"> terminal 
photostimulation showed a significant increase in inhibitory post- 
synaptic currents (IPSCs) that was completely blocked by the GABA, 
receptor antagonist, gabazine (Extended Data Fig. 4d, e). Optically 
induced IPSCs exhibited a response delay of ~7 ms (Extended Data 
Fig. 4f), which is in line with previously published response delays for 
ChR2 at monosynaptic circuits. Similarly, anterograde tracing of BF 
terminals in the [Hb revealed that they were colocalized with vesicular 


h, Representative heatmaps of aggression CPP. norm., normalized. 

i, j, Normalized (t)4= 4.706, ***P < 0.001; two-tailed unpaired t-test, n =8 
per group) (i) and subtracted CPP score (t)4= 4.013, **P < 0.01; 
two-tailed unpaired t-test, n = 8 per group) (j). k, Sensory CPP schematic. 
1, Representative heatmaps of sensory CPP. m, n, Normalized (m) 

(tig = 1.023, P > 0.05; two-tailed unpaired t-test, n= 10 per group) and 
subtracted (n) CPP score (t)g = 0.961, P > 0.05; two-tailed unpaired t-test, 
n= 10 per group). Summary data are represented as mean + standard 
error of the mean. n.c., no change. Experiments were conducted once; 

n indicates biological replicates. 


GABA transporter (VGAT), but not vesicular glutamate transporter 1 
(VGLUT1) (Extended Data Fig. 4g). To validate these findings within 
an intact system, we used multi-electrode recording of postsynaptic 
IHb firing rates in anaesthetized mice in combination with terminal 
photostimulation (Extended Data Fig. 5a). Results show that activa- 
tion (40 Hz ChR25!~">), or inhibition (8 s on, 2s off NDHR33“") of 
presynaptic BF terminals in the lHb resulted in decreased or increased 
1Hb postsynaptic neuronal firing, respectively (Extended Data 
Fig. 5b-d). These functional in vitro and in vivo recordings of 
ChR2®?~!4> and NpHR3®!~!"> confirm inhibitory GABAergic 
control over circuit activity and demonstrate reliable temporal control 
of Hb firing rates by optogenetic tools for in vivo behavioural analysis. 

To investigate the functional consequences of BF-IHb neuronal 
firing on aggression reward, we paired photostimulation of ChR22*'#> 
and NpHR3®! "4? in AGGs and NONs during the CPP test (Fig. 3a, b). 
NON::ChR2®?~!4 stimulation promoted CPP (Fig. 3c-e), mimicking 
responses observed in control AGGs. Conversely, AGG::NpHR3®? I> 
stimulation induced CPA (Fig. 3f-h), mimicking responses observed 
in control NONs. Neither NON::NpHR3®!~!4® or AGG::ChR25?~ 14> 
stimulation significantly affected the expression of CPP or CPA. Viral 
expression (Extended Data Fig. 6a—f) and locomotor activity (Extended 
Data Fig. 6g-j) were not different between conditions. These data 
confirm that BF-IHb circuitry modulates the rewarding component 
of aggressive behaviour and is both necessary and sufficient for the 
expression of CPP in AGGs and CPA in NONs. 

To determine if these circuit-specific effects could be recapitulated 
by direct 1Hb cell body manipulation, we injected the IHb with AAV2- 
hSyn-ChR2-eYFP or AAV-hSyn-NpHR3.0-eYFP (Extended Data 
Fig. 7a—d) and directly stimulated IHb cell bodies using previously 
established optogenetic parameters for IHb'5. NON::NpHR3.0!"® stim- 
ulation to decrease |Hb firing promoted CPP to the intruder-paired 
side (Extended Data Fig. 7e-g), whereas AGG::ChR2"? stimulation to 
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Figure 2 | GABAergic BF-IHb circuit is differentially activated by 
intruder interactions. a, Schematic of anterograde and retrograde tracing 
strategies. b, Representative anterograde AAV2-hSyn-eYFP infections 
(top, terminals) or retrograde G-deleted-rabies-eGFP infections (bottom, 
infection site) in lHb. Scale bars: 500 1m; insets, 150m. c, Representative 
anterograde AAV2-hSyn-eYFP infections (top, infection site) or 
retrograde G-deleted-rabies-eGFP infections (bottom, cell bodies) in the 
BE Scale bars: 400 1m; insets, 200 1m. d, Percentage retrograde-labelled 
eGFP* neurons within subnuclei of the anterior BF (n =3 mice, ~229 cells 
per mouse). e, Representative in situ hybridization colocalized GAD67 
and eGFP in DBN (left) and quantification (right) within the BF (n =3 
mice, 14 cells per mouse). Scale bars: 20 1m. f, Representative images of 
AAV2-hSyn-eYFP infection and c-Fos immunoreactivity in medial NAc 
shell transition zone of the BF (top) and medial Hb terminals (bottom). 
Scale bars: 30 um. g, Quantification of c-Fos immunoreactivity in the 
medial NAc shell-septum transition zone (tg = 2.655, *P < 0.05; two-tailed 


increase |Hb firing induced CPA to the intruder-paired side (Extended 
Data Fig. 7h-j). Taken together, these results implicate the lHb as a key 
modulator of aggression motivational state. 

To determine if BF-IHb neuronal activity regulates the initia- 
tion or intensity of aggressive behaviour, we used ChR2?*~!"> and 
NpHR3!—'H> (Fig. 4a) in AGGs and NONs during home-cage 
resident-intruder testing (Fig. 4b). Neither activation nor inhibition 
of BF-IHb terminals resulted in the initiation of aggressive behaviour 
(Fig. 4c, d), nor did it modulate social (Fig. 4e) and non-social (Fig. 4f) 
exploratory behaviours in NON mice. Similarly, AGG::ChR23*~/#> 
stimulation failed to initiate immediate attack behaviour, as indexed 
by no change in attack latency (Fig. 4g). However, AGG::ChR2"?~!#> 
and AGG::NpHR3®!—"4> stimulation bi-directionally modulated the 
severity of the aggressive behaviour relative to each other, although 
a nonsignificant trend was observed when either were compared to 
AGG::GEPPF~!Hb (Fig. 4h). As observed in NONs, AGG::ChR28F>!Hb 
and AGG::NpHR35F~!4® photostimulation failed to alter either social 
(Fig. 4i) or non-social (Fig. 4j) exploratory behaviours. These data 
indicate that the BF-IHb circuit is important in modulating the intensity 
of aggressive behaviour; however, it is not a traditional attack initiation 
area. 

On the basis of these data, we hypothesized that the BF-IHb circuit 
acts in other affective behavioural domains. We performed a behavioural 
battery to measure non-social generalized anxiety states and reward 
in naive CD-1 mice (Extended Data Fig. 8a). Both ChR22F!> and 
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unpaired t-test, nm = 6-8 mice per group, 3 slices per mouse) and medial 
IHb (t)2 = 5.678, ***P < 0.001; two-tailed unpaired t-test, n = 6-8 mice 
per group, 3 slices per mouse). h, Firing rate of Hb neurons in AGG and 
NON mice at 1h or 7 days after intruder interaction (F),67 = 10.56, two- 
way ANOVA P < 0.05; post-hoc test, **P < 0.01; n= 16-19 cells per group, 
4-5 mice per group). i, Representative trace of |Hb in vitro cell-attached 
firing rates. j, Representative trace of |Hb in vitro cell-attached firing 

rates during ChR28!~!#> photostimulation. k, Average firing rates of [Hb 
neurons during ChR2BF lb (tio = 3.679, **P < 0.01; two-tailed unpaired 
t-test, n =6 cells). 1, Representative trace of Hb in vitro cell-attached firing 
rates during NpHR3®*~"#? photostimulation. m, Average firing rates of 
1Hb neurons during NPHR33F~ (¢,. = 11.68, ***P < 0.0001; two-tailed 
unpaired t-test, n= 10 cells) photostimulation. Data are represented as 
mean +s.e.m. aLS, anterior lateral septum; DAPI, 4’,6-diamidino-2- 
phenylindole; mNAcs, medial nucleus accumbens shell. Experiments were 
conducted once; n indicates biological replicates. 


NpHR3?F 4? terminal photostimulation failed to modulate anxiety- 
like behaviours in the open field (Extended Data Fig. 8b, c) and 
elevated plus maze tasks (Extended Data Fig. 8d, e). However, 
ChR2?*-!H> stimulation potentiates the rewarding effects of cocaine 
by increasing the amount of time spent in the cocaine-paired chamber 
(Extended Data Fig. 8f). Therefore, while the BF-IHb circuit does not 
influence a generalized anxiety phenotype in the absence of social 
context or other stimuli, it does generalize to non-social rewarding 
stimuli such as cocaine. 

Our results show individual differences in the rewarding properties 
of aggressive social interaction that are mediated by the BF-IHb circuit. 
When exposed to an intruder, AGGs exhibit increased activity of the BF 
and a concomitant reduction in IHb neuronal firing relative to NONs, 
contributing to a behavioural preference for environmental contexts 
associated with the interaction. Importantly, this circuit is not sufficient 
to induce attack behaviour. Although anatomical studies have identified 
this BF projection to the 1Hb in mice and rats!®!8, and diffusion tensor 
imaging (DTI) suggests probabilistic tract connections between the 
BE and IHb in humans”, this is the first study to provide functional 
evidence that GABAergic BF projections produce inhibitory control 
of IHb neurons to regulate the valence of aggressive intruder-based 
interactions. Stimulation or inhibition of BF-IHb projections is both 
sufficient and necessary to alter the positive or negative valence of an 
intruder-paired context. Our findings advance the understanding of 
IHb function in a behaviourally relevant animal model of aggression 
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Figure 3 | BF-1Hb activity bi-directionally modulates aggression 
reward. a, b, Schematic of optogenetic viral infection strategy (a) and 
aggression CPP procedure (b). c, f, Representative CPP heatmaps for 
eYFP®F— IH, ChR2BF>IMb and NpHR35F—!H> between NON (c) and 

AGG (f) mice. norm., normalized. d, Normalized (F226 = 5.019, 

one-way ANOVA P < 0.05; post-hoc test, *P < 0.05, n= 9-10 per group) 
and subtracted CPP score (F225 = 6.666, one-way ANOVA P< 0.01; 
post-hoc test, *P <0.05, n= 9-10 per group) in NON::eYFP®F—!M?, 
NON::ChR23F~!Hb and NON::NpHr33F>!Hb mice. e, Individual duration 
in intruder-paired context for NON::eYEP?? >!" mice (ty = 0.9129, 

P> 0.05; two-tailed paired t-test, n = 10 per group), NON::ChR23F7!Hb 
mice (tg = 2.362, *P < 0.05; two-tailed paired t-test, n =9 per group), and 
NON::NpHR®?!Hb mice (to = 2.344, *P < 0.05; two-tailed paired f-test, 
n= 10 per group) during the pre-test and test sessions. g, Normalized 
(F229 = 5.470, one-way ANOVA P < 0.05; post-hoc test, *P < 0.05, n =7-8 
per group) and subtracted CPP score (F 2,2) = 4.964, one-way ANOVA 
P<0.05; post-hoc test, *P < 0.05, n=7-8 per group) for intruder-paired 
context in AGG::eYFPF 14>, AGG::ChR25F "> and AGG::NpHr3®F—#?, 
h, Individual duration in intruder-paired context for AGG::eY FPBF I> 
mice (tg = 5.070, **P < 0.01; two-tailed paired t-test, n =7 per group), 
AGG::ChR25?~'4® mice (t7 = 2.394, *P < 0.05; two-tailed paired t-test, 
n= 8 per group), and AGG::NpHRPFHHb mice (t7 = 1.763, P> 0.05; 
two-tailed paired t-test, n = 8 per group). Summary data are represented 
as mean + s.e.m. n.c., no change. Experiments were conducted once; 

n indicates biological replicates. 


motivation and provide further understanding into the physiology and 
neural circuitry of aggression and reward-related behaviours. 

While numerous functions have been ascribed to Hb’ neuronal 
activity, including anxiety”', addiction” and depression”, there is a 
noticeable paucity of functional data addressing the role of IHb inputs, 
outside of those originating from the VTA region, within any of these 
behavioural domains. Indeed, anatomical tracing experiments have 
highlighted the complexity of IHb afferents** and efferents*°. With 
regard to the BE, the lateral septum, DBN and medial NAc shell, but 
not core, are known to send projections to the IHb™*®. Our study impli- 
cates the septo-accumbal transition zone of BF as a critical source of 
GABAergic tone to the [Hb within the context of motivated behaviour. 
However, on the basis of the fact that these BF GABAergic inputs to [Hb 
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Figure 4 | BF-IHb does not initiate attack but modulates aggression 
severity. a, b, Schematic of optogenetic viral infection strategy (a) and 
aggression procedure (b). c-f, NON attack latency (c), attack duration (d), 
social exploration (e) and non-social exploration behaviours (f) in pre-test 
and test sessions (non-significant, n = 7-8 per group). g-j, AGG attack 
latency (F2,42 = 6.01, two-way ANOVA P< 0.001, *P <0.05, n=7-9 per 
group) (g), attack duration (Fy 4) = 5.666, two-way ANOVA P< 0.001, 

*P <0.05, n= 7-9 per group) (h), social exploration (i) and non-social 
exploration behaviours in pre-test and test sessions (j). Experiments were 
conducted once; n indicates biological replicates. 


exhibit high tonic activation in acute slice that can be rapidly inhibited 
by terminal inhibition with NpHR3 (Fig. 21, m), it is unlikely that they 
are NAc medium spiny neurons. Finally, based on both in vitro and 
in vivo electrophysiological studies, as well as anatomical tracing, we note 
that there may be a small subset of cells in the BF that either release an 
excitatory neurotransmitter or act indirectly on the IHb via di-synaptic 
inputs. It will be interesting in the future to determine what role these 
neurons have in reward processing. 

Our results may provide important information to clinical studies 
identifying novel targets of deep brain stimulation in the treatment of 
neuropsychiatric conditions that present with aggression co-morbidity 
such as substance abuse”’ and depression”®. Deep brain stimulation 
protocols within specific BF nuclei” and the IHb*®” have been success- 
fully used to treat intractable major depressive disorder, which is asso- 
ciated with symptoms of increased aggression in men”’. Overall, our 
findings demonstrate a previously unidentified functional role for the 
IHb and its inputs from the BF in mediating the rewarding component 
of aggression, and suggest that targeting shared underlying deficits in 
motivational circuitry may provide useful information for the devel- 
opment of novel therapeutic strategies for treating aggression-related 
neuropsychiatric disorders. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 6 April 2015; accepted 17 May 2016. 


1. Anderson, D. J. Optogenetics, sex, and violence in the brain: implications for 
psychiatry. Biol. Psychiatry 71, 1081-1089 (2012). 

2. Decety, J., Michalska, K. J., Akitsuki, Y. & Lahey, B. B. Atypical empathic 
responses in adolescents with aggressive conduct disorder: a functional 
MRI investigation. Biol. Psychol. 80, 203-211 (2009). 

3. Yang, C. F. et al. Sexually dimorphic neurons in the ventromedial hypothalamus 
govern mating in both sexes and aggression in males. Cel! 153, 896-909 (2013). 


30 JUNE 2016 | VOL 534 | NATURE | 691 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


12. 


21. 


22. 


Wasman, M. & Flynn, J. P. Directed attack elicited from hypothalamus. 

Arch. Neurol. 6, 220-227 (1962). 

Lin, D. et a/. Functional identification of an aggression locus in the mouse 
hypothalamus. Nature 470, 221-226 (2011). 

Unger, E. K. et al. Medial amygdalar aromatase neurons regulate aggression in 
both sexes. Cell Reports 10, 453-462 (2015). 

Yu, Q. et al. Optogenetic stimulation of DAergic VTA neurons increases 
aggression. Mol. Psychiatry 19, 635 (2014). 

Takahashi, A. & Miczek, K. A. Neurogenetics of aggressive behavior: studies in 
rodents. Curr. Top. Behav. Neurosci. 17, 3-44 (2013). 

Kudryavtseva, N. N., Bakshtanovskaya, I. V. & Koryakina, L. A. Social model of 
depression in mice of C57BL/6J strain. Pharmacol. Biochem. Behav. 38, 
315-320 (1991). 


. Golden, S.A. Covington, H. E. Ill, Berton, O. & Russo, S. J. A standardized 


protocol for repeated social defeat stress in mice. Nat. Protocols 6, 1183-1191 
(2011). 


. Miczek, K. A., DeBold, J. F. & Thompson, M. L. Pharmacological, hormonal, 


and behavioral manipulations in analysis of aggressive behavior. Prog. Clin. 
Biol. Res. 167, 1-26 (1984). 

Glenn, A. L. & Yang, Y. The potential role of the striatum in antisocial behavior 
and psychopathy. Biol. Psychiatry 72, 817-822 (2012). 


. Zahm, D. S., Parsley, K. P, Schwartz, Z. M. & Cheng, A. Y. On lateral septum-like 


characteristics of outputs from the accumbal hedonic “hotspot” of Pecifia and 
Berridge with commentary on the transitional nature of basal forebrain 
“boundaries”. J. Comp. Neurol. 521, 50-68 (2013). 


. Callaway, E. M. & Luo, L. Monosynaptic circuit tracing with glycoprotein-deleted 


rabies viruses. J. Neurosci. 35, 8979-8985 (2015). 


. Lammel, S. et al. Input-specific control of reward and aversion in the ventral 


tegmental area. Nature 491, 212-217 (2012). 


. Herkenham, M. & Nauta, W. J. Afferent connections of the habenular nuclei in 


the rat. A horseradish peroxidase study, with a note on the fiber-of-passage 
problem. J. Comp. Neurol. 173, 123-145 (1977). 


. Sutherland, R. J. The dorsal diencephalic conduction system: a review of the 


anatomy and functions of the habenular complex. Neurosci. Biobehav. Rev. 6, 
1-13 (1982). 


. Lecca, S., Meye, F. J. & Mameli, M. The lateral habenula in addiction and 


depression: an anatomical, synaptic and behavioral overview. Eur. J. Neurosci. 
39, 1170-1178 (2014). 


. Shelton, L. et al. Mapping pain activation and connectivity of the human 


habenula. J. Neurophysiol. 107, 2633-2648 (2012). 


. Hikosaka, O. The habenula: from stress evasion to value-based decision- 


making. Nature Rev. Neurosci. 11, 503-513 (2010). 
Lee, E.H. & Huang, S. L. Role of lateral habenula in the regulation of 
exploratory behavior and its relationship to stress in rats. Behav. Brain Res. 
30, 265-271 (1988). 

aroteaux, M. & Mameli, M. Cocaine evokes projection-specific synaptic 
plasticity of lateral habenula neurons. J. Neurosci. 32, 12641-12646 (2012). 


692 | NATURE | VOL 534 | 30 JUNE 2016 


23. Li, B. et al. Synaptic potentiation onto habenula neurons in the learned 
helplessness model of depression. Nature 470, 535-539 (2011). 

24. Yetnikoff, L., Cheng, A. Y., Lavezzi, H. N., Parsley, K. P. & Zahm, D. S. Sources of 
input to the rostromedial tegmental nucleus, ventral tegmental area, and 
lateral habenula compared: a study in rat. J. Comp. Neurol. 523, 2426-2456 
(2015). 

25. Quina, L. A. et al. Efferent pathways of the mouse lateral habenula. J. Comp. 
Neurol. 523, 32-60 (2015). 

26. Felton, T. M., Linton, L., Rosenblatt, J. S. & Morell, J. |. First and second order 
maternal behavior related afferents of the lateral habenula. Neuroreport 10, 
883-887 (1999). 


27. Beck, A. Heinz, A. J. & Heinz, A. Translational clinical neuroscience perspectives 


on the cognitive and neurobiological mechanisms underlying alcohol-related 
aggression. Curr. Top. Behav. Neurosci. 17, 443-474 (2014). 

28. Martin, L.A., Neighbors, H. W. & Griffith, D. M. The experience of symptoms of 
depression in men vs women: analysis of the National Comorbidity Survey 
Replication. JAMA Psychiatry 70, 1100-1106 (2013). 

29. Bewernick, B. H., Kayser, S., Sturm, V. & Schlaepfer, T. E. Long-term effects of 
nucleus accumbens deep brain stimulation in treatment-resistant depression: 
evidence for sustained efficacy. Neuropsychopharmacology 37, 1975-1985 
(2012). 

30. Sartorius, A. & Henn, F. A. Deep brain stimulation of the lateral habenula in 
treatment resistant major depression. Med. Hypotheses 69, 1305-1308 
(2007). 


Acknowledgements This research was supported by US National Institutes 
of Health grants RO1 MHO90264, P50 MHO96890 and P50 ATO08661-01 
(S.J.R.), ROI MHO92306 (M.H.H.), T32 MHO87004 (M.L.P, M.H. and M.F.), T32 
MH096678 (M.L.P.), F30 MH100835 (M.H.), F31 MH105217 (M.L.P), National 
Institute of General Medical Sciences 1FI2GM117583-01 (S.A.G.) and the 
Natural National Science Foundation of China 81200862 (H.Z.). We would like 
to thank K. Miczek and Y. Shaham for their input. 


Author Contributions S.A.G. and S.J.R. designed and wrote the manuscript. 
S.A.G., DJ.C., M.H., C.M., J.J.W., M.L.P, N.R.H.A., G.E.H., M.F, D.B., L-K., J.T. and 
B.K. collected behavioural and immunohistochemistry data and aided in data 
analysis. H.Z., M.-H.H., D.C., K.G. and M.L.S. designed, carried out and analysed 
electrophysiological experiments. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. 
Correspondence and requests for materials should be addressed to 
S.J.R. (scott.russo@mssm.edu). 


Reviewer Information Nature thanks O. Hikosaka and the other anonymous 
reviewer(s) for their contribution to the peer review of this work. 


© 2016 Macmillan Publishers Limited. All rights reserved 


METHODS 

Animals. Male CD-1 (ICR) mice (35-45 g, sexually experienced retired breeders; 
Charles River Laboratories (CRL)) were obtained at 4 months of age. All breeders 
were confirmed by CRL to have had equal access, experience and success as breeders. 
Male C57BL/6J mice (20-30 g; The Jackson Laboratory) were obtained at 
7-8 weeks of age and used as novel intruders. All mice were allowed 1 week of accli- 
mation to the housing facilities before the start of experiments. CD-1 mice were 
single housed, and C57BL/6)J mice were group housed. All mice were maintained 
ona 12h light:dark cycle with ad libitum access to food and water. Procedures were 
performed in accordance with the National Institutes of Health Guide for Care 
and Use of Laboratory Animals and the Icahn School of Medicine at Mount Sinai 
Institutional Animal Care and Use Committee. 

Aggression screening and ethological analysis. Aggression screening was per- 
formed as previously described'®. After a minimum of 1 week habituation to home 
cages, experimental CD-1 mice were exposed to a novel C57BL/6] intruder for 
3 min daily over 3 consecutive days. Each intruder presentation was performed 
in the home cage of the CD-1 mouse between 1-3 PM daily under white light 
conditions. During screening sessions the cage top along with feeding and water 
apparatus were replaced with a clear Plexiglass cover to allow unimpeded viewing 
and video recording of screening sessions. The duration and number of screening 
sessions were selected to prevent induction of stress- and anxiety-related behav- 
iours in CD-1 mice (Extended Data Tables 1 and 2), which has been shown to 
occur during extended antagonist encounters*. This allows for separation between 
aggression and stress-related states. All screening sessions were video recorded for 
later ethological analysis using a digital colour video camera. Ethological analysis 
of aggression behaviour was performed by two blinded observers, recording 
(1) latency to initial aggression, (2) the number of aggressive bouts, (3) the total 
duration of aggression, and (4) the mean duration of aggressive bouts. Operational 
definitions for these behaviours are defined as follows: initiation of aggression is 
defined by the first clear physical antagonist interaction initiated by the CD-1 
mouse, not including grooming or pursuit behaviour. Aggressive bouts are defined 
by cycles of initiated aggression with continuous orientation by the CD-1 mouse 
towards the intruder, and only defined as completed when the CD-1 mouse has 
physically reoriented away from the intruder. This definition allows for slight 
breaks (less than 5s) in continuous physical interaction within an aggressive 
bout, assuming the CD-1 mouse has remained oriented towards the intruder 
throughout. CD-1 mice were defined as AGG if they initiated aggression during 
any of the three screening sessions and NON were defined as those that showed no 
aggression during any screening sessions. All aggression screening was halted ifan 
intruder showed any signs of injury in accordance with our previously published 
protocol!®, 

Aggression CPP and behavioural analysis. The aggression CPP protocol, devel- 
oped on the basis of a previously published cocaine CPP protocol*’, consisted 
of three phases: pre-test, acquisition, and test. Mice were acclimated to the test- 
ing facility for 1h before testing. All phases were conducted under red light and 
sound-attenuated conditions. The CPP apparatus (Med Associates) consisted of 
two unique conditioning chambers with a neutral middle zone that allowed for 
unbiased entry into either conditioning chamber at the initiation of each trial. 
All CPP sessions were video recorded using Noldus Ethovision 3.0 (Noldus 
Information Technology). During the pre-test phase, mice were placed into the 
middle chamber of the conditioning apparatus and allowed to freely explore the full 
extent of the CPP apparatus for 20 min. There were no group differences in bias for 
either chamber, and conditioning groups were balanced in an unbiased fashion to 
account for start side preference. The acquisition phase consisted of three successive 
days with two conditioning trials each day for a total of six acquisition trials. 
Morning trials (between 8:00 AM and 10:00 AM) and afternoon trials (between 
3:00 PM and 5:00 PM) consisted of CD-1 mice confined to one chamber for 10 min 
while in the presence or absence of a novel C57BL/6) intruder. All groups were 
counterbalanced for conditioning chamber. A total of three conditioning trials to 
the intruder-paired and intruder-unpaired context where performed. On the test 
day, CD-1 mice were placed into the middle arena of the CPP apparatus without an 
intruder and allowed to freely explore both chambers for 20 min. Analysis of dura- 
tion spent within either context was used to identify a CPP or CPA to the intruder- 
paired context. For optogenetic experiments, stimulation was performed during 
the full duration of the test phase. Total locomotor responses were also assessed 
to ensure equal exploratory behaviour between groups. Behavioural analysis of 
aggression CPP data was performed by assessing (1) normalized CPP (test phase 
duration spent in the intruder-paired chamber divided by the pre-test duration 
spent in the intruder-paired chamber, accounting for behaviour during both 
sessions), (2) subtracted CPP (test phase duration spent in the intruder-paired 
chamber minus test phase duration spent in the intruder-unpaired chamber, 
accounting for test session behaviour only), and (3) group and individual durations 
in both pre-test and test sessions. 
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Sensory CPP. Sensory CPP was performed and analysed identically to the 
aggression CPP procedure, with the exception that the intruder C57BL/6J was 
placed within a physical barrier to provide only sensory contact with the resident 
CD-1 mice. 

Cocaine CPP. A previously published cocaine CPP protocol*” was used, which 
consisted of three phases: pre-test, acquisition, and test. Mice were acclimated to 
the testing facility for 1h before testing. All phases were conducted under red light 
and sound-attenuated conditions. The CPP apparatus (Med Associates) consisted 
of two unique conditioning chambers with a neutral middle zone that allowed 
for unbiased entry into either conditioning chamber at the initiation of each 
trial. All CPP sessions were video recorded using Noldus Ethovision 3.0 (Noldus 
Information Technology). During the pre-test phase, mice were placed into the 
middle chamber of the conditioning apparatus and allowed to freely explore the full 
extent of the CPP apparatus for 30 min. There were no group differences in bias for 
either chamber, and conditioning groups were balanced in an unbiased fashion to 
account for start side preference. The acquisition phase consisted of two successive 
days with two conditioning trials each day for a total of four acquisition trials. 
Morning trials (between 8:00 AM and 10:00 AM) and afternoon trials (between 
3:00 PM and 5:00 PM) consisted of CD-1 mice confined to one chamber for 20 min 
paired with an intraperitoneal injection of cocaine (10mg kg” '); afternoon sessions 
were paired with saline injections. All groups were counterbalanced for condition- 
ing chamber. On the test day, CD-1 mice were placed into the middle arena of the 
CPP apparatus and allowed to freely explore both chambers for 30 min. Analysis 
of duration spent within either context was used to identify a CPP or CPA to the 
cocaine-paired context. For optogenetic experiments, stimulation was performed 
during the full duration of the test phase. Total locomotor responses were also 
assessed to ensure equal exploratory behaviour between groups. 

Sucrose preference. Sucrose preference was performed as previously described*’. 
One week after the final screening session, AGG and NON mice had their standard 
water bottle removed and replaced with two 50-ml conical tubes with sipper tops 
filled with water. After a 24-h habituation period, water from one 50-ml conical 
tube was replaced with 1% sucrose. All tubes were weighed, and mice were allowed 
24h to drink. Tubes were then reweighed, and their locations in the wire tops 
were switched before a second 24-h period of drinking. At the end of the second 
day of sucrose testing, preference was calculated as the total amount of sucrose 
consumption divided by the total amount of fluid consumed over the 2 days of 
sucrose availability. 

Elevated plus maze. The elevated plus maze was performed as previously 
described**, One week after the final screening session, AGG and NON mice 
were acclimated to the testing facility for 1h before testing and then placed in the 
elevated plus maze under red light conditions for 5 min. Each arm of the maze 
measured 12 x 50cm. The Plexiglas cross-shaped maze consisted of two open 
arms with no walls and two closed arms (40-cm-high walls) and was on a pedestal 
1m above floor level. Behaviour was tracked using an automated system (Noldus 
Ethovision; Noldus Interactive technologies). Behaviour was measured as total 
time in open and closed arms. 

Open field and locomotor measures. Open field was performed as previously 
described*’. One week after the final screening session, AGG and NON mice were 
acclimated to the testing facility for 1 h before testing. Open-field testing was per- 
formed in black Plexiglass arenas (42 x 42 x 42 cm; Nationwide Plastics) under red 
light conditions. Testing sessions were 10 min long. Behaviour was tracked using an 
automated system (Noldus Ethovision; Noldus Interactive technologies) to record 
the total distance moved and time spent in the total arena and a delineated ‘centre 
zone (24cm x 24cm). 

Forced-swim test. The forced-swim test was performed as previously described*4. 
One week after the final screening session, AGG and NON mice were placed in the 
test room for an hour before behavioural testing for habituation. Mice were tested 
in a 4-litre Pyrex glass beaker, containing 2 litres of water at 25+ 1°C for 6 min. 
Behaviour was videotaped (Noldus Ethovision; Noldus Interactive technologies) 
and analysed for duration immobile, duration mobile and total movement. 
Social interaction (approach). Social approach testing was performed as 
previously described!’ One week after the final screening session, AGG and 
NON mice were acclimated to the testing facility for 1h before testing, and all 
testing was performed under red light conditions. Mice were placed in an open 
field black Plexiglas arena (42 x 42 x 42 cm; Nationwide Plastics) with a small 
animal cage placed at one end. Their movements were then automatically moni- 
tored and recorded (Ethovision 3.0; Noldus Information Technology) for 2.5 min in 
the absence (target absent phase) of a social target. This phase is used to determine 
baseline exploratory behaviour. We then immediately measured 2.5 min of explor- 
atory behaviour in the presence of a caged novel CD-1 or C57BL/6J mouse (target 
present phase), again recording total distance travelled and duration of time spent 
in the interaction and corner zones. Social interaction behaviour is determined by 
the total time spent in each zone. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Novel object versus social target preference. The novel object versus social target 
test consisted of two phases: pre-test and test on consecutive days, as previously 
described**. One week after the final screening session, AGG and NON mice were 
acclimated to the testing facility for 1h before both sessions. All phases were run 
under red-light and sound-attenuated conditions. The testing apparatus (Med 
Associates) consisted of two identical chambers, with a neutral middle zone that 
allowed for unbiased entry into either chamber at the initiation of each trial. All ses- 
sions were video recorded from above (Noldus Ethovision 3.0, Noldus Information 
Technology) for later behavioural analysis. Briefly, during the pre-test phase, mice 
were placed into the middle chamber of the apparatus and allowed to freely explore 
all zones for 5 min. There were no group differences in pre-test preference for either 
chamber. Conditioning groups were then balanced in an unbiased way to account 
for individual animals’ preference. On the test day, mice were placed back into the 
apparatus in the presence of both a novel object (an upside-down steel-bar pencil 
holder) on one side and a social target (identical pencil holder containing either a 
novel CD-1 or a C57BL/6J mouse) on the other. Test mice were allowed to freely 
explore the apparatus for 5 min. The time spent in each chamber was recorded 
and used for analysis. The subtracted social score is derived by subtracting time 
in social-paired chamber from time in novel object-paired chamber during the 
test phase. Normalized social score is the ratio of time spent in the chamber of 
interest (social target or novel object) during the test phase over the pre-test phase. 
Blood sampling and testosterone/corticosterone ELISA. Submandibular vein 
bleeds were taken from mice 4-24h after the final screening session as previously 
described*°, Serum testosterone (RND Systems, Testosterone Parameter Assay 
Kit) and corticosterone (Immunodiagnostic Systems, IDS Corticosterone EIA Kit) 
levels were assessed via ELISA according to manufacture specifications. Briefly, 
blood was collected in a serum separator tube, allowed to clot for 30 min at room 
temperature, and centrifuged at 1,000g for 15 min and stored frozen (—20°C) until 
analysis. The sensitivity of the testosterone assay (minimum detectable dose rang- 
ing from 0.012 to 0.041 ng ml) falls well below the ranges detected experimentally 
within our cohort (lowest serum testosterone concentration of 2.58ng ml). 
Perfusion and brain tissue processing. For immunohistochemistry and histology, 
mice were given a euthanizing dose of 15% chloral hydrate and transcardially per- 
fused with cold 1% paraformaldehyde in PBS (pH 7.4) followed by fixation with 
cold 4% paraformaldehyde in PBS. Brains were dissected and post-fixed for 18h in 
the same fixative. Coronal sections were prepared on a vibratome (Leica) at 501m 
to assess viral placement and immunohistochemistry. For in situ hybridization, 
mice brains were rapidly removed and flash frozen in —30°C isopentane for 60s 
and then kept at —80°C until sectioning. 

Immunohistochemistry, in situ hybridization and confocal microscopy. 
For c-Fos experiments, sections were incubated overnight in blocking solution 
(3% normal donkey serum, 0.3% Triton X-100 in PBS), washed in PBS for 2h, 
then incubated for 48h in primary antibody (rabbit anti-c-Fos (Santa Cruz 
Biotechnology, SC-42) 1:2,000). Slices were then washed in PBS for 2h, incu- 
bated in secondary antibody for 2h (donkey anti-rabbit Cy2 1:200 (Jackson 
ImmunoResearch)), then washed in PBS for 30 min before staining with 1 pg ml! 
DAPI (Sigma) for 20 min. Sections were then mounted, air-dried overnight and 
coverslipped with hardset Vectashield (Jackson ImmunoResearch). All slices were 
images using a Zeiss LSM 780. For c-Fos analysis, all images were taken at x20 
magnification for both the BF and Hb, using the tile-scan function to span the 
entire region of interest. Analysis of c-Fos-positive nuclei was performed using 
NIH Image] in conjunction with the ‘analyze particle’ function on single images. 
For representative images demonstrating the areas of viral infection, images were 
acquired at x 10 magnification using the tile-scan function. 

For all other immunohistochemistry, coronal sections (50|1m) were used for 
all immunofluorescence experiments. Sections were incubated in blocking solu- 
tion (3% normal donkey serum, 0.3% Triton X-100 in PBS) for 1h. Sections were 
then incubated in primary antibody overnight at 4°C (VGAT 1:500 (Synaptic 
Systems); VGLUT1 (Millipore) 1:5,000; 1:250; GFP (Aves) 1:1,000). Next, sec- 
tions were washed in PBS for 60 min and then incubated in secondary antibody 
for 2h (donkey anti-guinea pig Cy5 1:400; donkey anti-goat Cy5 1:400; donkey 
anti-chicken Cy2 1:400 (Jackson ImmunoResearch)), then washed with PBS for 
60 min, stained with 1 1g ml! DAPI (Sigma) for 20 min, mounted and air-dried 
overnight. Slides were quickly washed in ethanol 70%, 95%, 100% and Citrosolv 
(Fisher), and coverslipped with DPX mounting medium (Electron Microscopy 
Sciences). All slices were images using a Zeiss LSM 780. For puncta imaging, 1-j1m 
z-stacks were taken at x 100 magnification for both the BF and IHb. Deconvolution 
was performed on all z-stacks with AutoQuant X (MediaCybernetics). For repre- 
sentative image demonstrating the area of viral infection, images were acquired 
at x 100 magnification using the tile-scan function. 

For in situ hybridization, RNAScope Multiplex Flourescent Kits (Advanced Cell 
Diagnostics) were used with the company-provided procedure. Briefly, fresh frozen 
brains were slide mounted at 161m thickness, fixed for 15 min in cold 4% PFA, seri- 


ally dehydrated with increasing EtOH concentration washes (50%, 75% 100% EtOH 
for 2 min each), and pre-treated with protease reagent (Protease IV, RNAscope) for 
20 min. Proprietary probes (Advanced Cell Diagnostics) for eGFP (Channel 1) or 
GAD67 (Channel 2) were hybridized at 40°C for 2h, and then subjected to a series 
of amplification steps at 40°C (1-FL: 30 min; 2-FL: 15 min; 3-FL: 30 min; 4-FL: 
15min). For the fourth amplification step, Reagent Alt-A was used, corresponding 
with Channel 1 visualization at 488 nm and Channel 2 at 550 nm. Finally slides 
were treated for 2 min with DAPI, an immediately coverslipped with EcoMount. 
In vitro electrophysiology. All recordings were performed blind to experimental 
condition, and performed in both NON and AGG CD-1 mice. For optogenetic 
slice electrophysiology, mice were anaesthetized with isoflurane, and perfused with 
cold artificial cerebrospinal fluid (aCSF) composed of (in mM): 128 NaCl, 3 KC], 
1.25 NaH2POsg, 10 p-glucose, 24 NaHCO3, 2 CaCl, and 2 MgCl, (oxygenated with 
95% O» and 5% COs, pH 7.35, 295-305 mOsm) as described in our previous 
work?”*8, Briefly, acute brain slices containing the IHb were cut using a microslicer 
(DTK-1000, Ted Pella) in 95% O, and 5% CO, saturated sucrose-aCSF, which was 
derived by fully replacing NaCl with 254 mM sucrose. Slices were maintained in 
the holding chamber containing aCSF for 1h at 37°C. Slices were then transferred 
into a recording chamber fitted with a constant flow rate of aCSF equilibrated with 
95%/5% Oo/CO; (2.5ml min™') maintained at 35°C. Cell-attached recording mode 
was used to measure the firing rates of Hb neurons. In these recording experi- 
ments, glass recording pipettes (7-10 MQ) were filled with an internal solution 
composed of (in mM): 115 potassium gluconate, 20 KCI, 1.5 MgCl», 10 phospho- 
creatine, 10 HEPES, 2 magnesium ATP and 0.5 GTP (pH 7.2, 285 mOsm). For the 
experiments to measure inhibitory postsynaptic currents, whole-cell recordings 
were performed under voltage-clamp mode (holding at —70 mV) in the presence 
of kynurenic acid (1 mM) with or without gabazine (2|1M) in aCSF, Glass recording 
pipettes (3-4 MQ) for these whole-cell studies were filled with the internal solu- 
tion composed of (mM): 120 CsCl, 10 phosphocreatine-Na, 10 HEPES, 10 EGTA, 
2 ATP-Mg, 0.3 GTP-Tris (pH 7.2, 285 mOsm). Data acquisition was conducted 
using a Digidata 1440A digitizer and pClamp 10.2 (Axon Instruments). 
Stereotaxic surgery and viral gene transfer. All surgeries were performed under 
aseptic conditions using anaesthetic. Briefly, mice were anesthetized with a mixture 
of ketamine (100 mg per kg body weight) and xylazine (10 mg per kg body weight) 
and positioned in a small-animal stereotaxic instrument (David Kopf Instruments) 
and the skull surface was exposed. Thirty-three-gauge syringe needles (Hamilton 
Co.) were used to bilaterally infuse either 0.5 il (BF) or 0.411 (LHb) of virus over a 
5 min period and the needle was removed after 5 min. NAc shell-septum transition 
zone BF stereotaxic coordinates taken from bregma (anteroposterior + 1.5mm; 
mediolateral, + 1.6 mm; dorsoventral, —4.4 mm; angle 10°). [Hb stereotaxic coor- 
dinates taken from bregma (anteroposterior, —1.7 mm; mediolateral, + 0.4mm; 
dorsoventral, —2.5 mm; angle 0°). For [Hb optogenetic experiments, animals were 
implanted with an optical fibre at the same time as viral injection (dorsoventral, 
—2.0mm). For secure fixture of the implantable fibre to the skull, the skull was 
dried and then industrial-strength dental cement (Grip cement; Dentsply) was 
added between the base of the implantable fibre and the skull. For non-conditional 
axonal tract tracing, 0.5 ul AAV2-hSyn-eYFP (1.5 x 10!) infectious units per ml, 
UNC Vector Core) was injected bilaterally into the BE For retrograde tracing, 0.411 
G-deleted-rabies-eGFP (1.33 x 10* infectious units per ml, Salk Gene Transfer 
Targeting and Therapeutics Core) was injected into the IHb. For behavioural opto- 
genetic experiments, 0.511 of non-conditional AAV2-hSyn-eYFP, AAV2-hSyn- 
hChR2(H134R)-eYFP or AAV-hSyn-eNpHR3.0-eYFP (1.5 x 101!" infectious 
units per ml, UNC Vector Core) were injected into the BF (terminal stimulation) 
or 1Hb (cell body stimulation). All non-rabies AAV injections were performed 
between 4-6 weeks before tracing or behavioural experiments; rabies-infected 
brains were collected 7 days after injection. 

Blue light stimulation. Optical fibres (Thor Labs, BFL37-200) were connected 
using an FC/PC adaptor to a 473-nm blue laser diode (Crystal Laser, BCL-473- 
050-M) and a stimulator (Agilent Technologies, no. 33220A) was used to generate 
blue light pulses. For all in vivo behavioural experiments and optrode recordings, 
mice were given 40 Hz 5 ms light stimulations. Intensity of light delivered to fer- 
rule was ~10 mW. These parameters are consistent with previously validated and 
published protocols for NAc medium spiny neurons**” and IHb neurons!" 
Yellow light stimulation. Optical fibres (Thor Labs, SFS200/220Y) were connected 
using an FC/PC adaptor to a 561-nm yellow laser diode (Crystal Laser, CL561- 
050-L), and a stimulator (Agilent Technologies, no. 33220A) was used to generate 
yellow light pulses. For in vivo optrode recordings we tested a protocol of 8s of 
yellow light on followed by 2s of light off. Intensity of light delivered to ferrule was 
~10mW. These parameters are consistent with previously validated and published 
protocols for NAc MSNs and IHb neurons**, 

In vivo recordings 

Optrode construction and implantation. An optrode was constructed by gluing four 
tetrodes to an optical fibre. Four tetrodes spun of 12.7-\1m-diameter nichrome wire 
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(California Fine Wire) were glued to a 200-\1m-diameter optical fibre (Thor Labs, 
SFS200/220Y) and cut so that they extended between 750 and 250\1m beyond the 
end of the fibre. The tetrodes were pinned into an electrode interface board (EIB; 
Neuralynx) and the tips were plated by passing 0.2|1A current pulses through the 
individual wires and a gold solution until the impedance reached 150-200 kOhm. 
The optrode was mounted on a stereotax arm (Kopf Instruments) and then lowered 
into the brain during surgery. Two small holes were drilled anterior and posterior 
to the recording site to serve as sites for ground screws. The ground screws were 
constructed by soldering stainless steel self-tapping screws to 3mm stainless 
steel wire secured to the EIB. Screws were inserted far enough to come in contact 
with dura. 

Recording. Recordings were carried out using a Digital Lynx 16SX recording 
system and Cheetah data acquisition software (Neuralynx). Signals from the tetrodes 
were bandpass filtered between 600 and 9,000 Hz and digitized at 32 kHz. Spike 
detection was performed in real time using a thresholding procedure: when the 
filtered signal reached threshold amplitude on any wire, a sweep including 8 data 
points before the crossing and 24 points after (32 points, or 1 ms) were saved as a 
putative spike event. Spike sorting and noise filtering was performed offline. The 
laser intensity was adjusted to ~5 mW at the tip of the optrode before implantation. 
The optrode was lowered using the stereotax arm until the tetrode tips reached the 
dorsal extent of the IHb. Once the tissue and recordings stabilized, the optrode was 
slowly advanced until spikes were observed on at least one of the tetrodes. Spike 
amplitude and firing rate were allowed to stabilize and observed for several minutes 
before recording. For all trials a 30s baseline recording was acquired, followed by 
1 min of stimulation and ending with a 30s post-stimulation baseline. The optrode 
was then stepped forward and this procedure repeated until the inferior extent of 
the IHb was reached. 

Analysis. Data were analysed using custom scripts written in Matlab (MathWorks). 
A first round of preliminary spike sorting was carried out using spike waveforms 
as parameters in KlustaKwik“*. The output from KlustaKwik was then imported 
into Matlab and clusters were manually edited using custom spike sorting software. 
Clearly separated clusters of spikes were assigned to functional units and entered 
into further analysis; noise spikes (for example, from spurious threshold crossings) 
and units that fired fewer than 100 spikes during recording were discarded. Spike 
rates were calculated in 2-s non-overlapping bins across the baseline and stimulation 
epochs. The resulting functions were smoothed using a Gaussian window with a 
standard deviation of 10s. The rate function for each unit was then z-scored across 
all three epochs. For statistical analyses, rates were calculated in either 15-s bins 
or bins encompassing the entire baseline and stimulation periods. No smoothing 
was applied. The rate functions for each unit were z-scored across all three epochs 
and the z-scored rate functions were used to assess statistical significance. 
Randomization and blinding. All experimenters were blinded to experimental 
condition. Mice were first screened to determine whether they were aggressive or 


LETTER 


non-aggressive and then randomly assigned to optogenetic viral conditions for fur- 
ther behavioural analysis. For behavioural studies in Fig. 1 and slice physiology and 
c-Fos mapping studies in Fig. 2, AGGs and NONs were pre-screened for aggression 
and assigned to groups on the basis of their behavioural profile. 

Statistical analysis. Sample size was calculated based on previous studies using 
Statmate from Graphpad prism (Graphpad Software). All t-tests, one-way 
ANOVaAs, two-way ANOVAs and chi-squared tests were performed using Graph 
Pad Prism software (Graphpad Software Inc.). Bonferroni was used as a post- 
hoc test when appropriate for one-way and two-way ANOVAs. Normality was 
determined by D’Agostino-Pearson, Shapiro-Wilk and Kolmogorov-Smirnov 
normality tests. Statistical significance was set at P< 0.05. 
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Extended Data Figure 1 | Social behaviours exhibited by resident CD-1 
and intruder C57 mice during aggression screening. a, Experimental 
schematic of aggression screening procedure used in a subset 

(40 residents and 40 intruders) of mice to quantify social behaviours. 

b-e, Bouts of attacks (F256 = 13.10, two-way ANOVA ***P < 0.0001; 
post-hoc test ***P <0.001; n= 40 per group) (b), pursuits (c), withdrawals 


(F2,156 = 5.745, two-way repeated measures ANOVA **P < 0.001; 
post-hoc test ***P <0.001; n= 40 per group) (d) and non-aggressive 
social approaches (e). f-h, Duration of attacks (F156 = 7.069, two-way 
repeated measures ANOVA **P < 0.001; post-hoc test ***P <0.001; 
n= 40 per group) (f), pursuits (g), withdrawals (h) and non-aggressive 
social approaches (e). All data are presented as mean + s.e.m. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Latency (s) 200 5 
a | ____ Da — c f 
es 0 50 100 150 200 @ ° ° 
ee ' ) © 1504 
ocean 4 as NE 1 OO i eg: o 
——— — _— § 100 ON z 
= Oo > 
q q ) § + 50 2 
Oot Oe fee 2 g 
0 ro 
i i i NU % 
Physical Physical Physical ROIRNGNY & c 
(3 min) (3 min) (3 min) d Go" WY = 
> 2 54 
Novel C57BL/6J 3 44 
a 34 
ec 4 
—O-_NONs (138) 5 5] 
P=] 
-@- Aces (310) 5 z 44 
o 
b 3 0{ 0-0-0 g 
KY © 
Ss” eh 200 
= MMH ye * 
Pal 
5 20 . z ° 5 
$s S. w 
oa & 4 z 
= 10 © q 
3 2 # 
co) Ss at 
ole s ° 8 04 0-0-0 
TP PH HG PW PGF 


Attack latency (10-s bin) 


Extended Data Figure 2 | Detailed ethological analysis of AGG 
aggression-related behaviours. a, Experimental schematic of aggression 
screening procedure used in a sample (448 mice total; 138 NON and 

310 AGG) of mice. b, Histogram of attack latency frequency using 10-s 
bins. c—e, Mean distribution across screening sessions (left) and individual 
screening sessions (right) for latency to aggression (F2,1333 = 49.37, 
two-way repeated measures ANOVA P< 0.001; post-hoc test, *P < 0.001; 
n= 138-310) (c), number of attack bouts (F2,1333 = 21.03, two-way repeated 
measures ANOVA P < 0.001; post-hoc test, *P < 0.001; n= 138-310) (d) 
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Pearson r = -0.78 
R squared = 0.61 
P value = < 0.0001 


Pearson r = -0.40 
R squared = 0.15 
P value = < 0.0001 


5 10 
Attack duration (s) 


15 


and mean attack duration (F2,1333 = 11.96, two-way repeated measures 
ANOVA P< 0.001; post-hoc test, *P < 0.001; n = 138-310) (e). f, g, 
Correlation of mean latency to initial aggression with mean attack bouts 
(r= —0.78, P< 0.0001) (f) and mean duration of attack bouts (r= —0.40, 
P<0.0001) (g). Distribution plots are presented as the median with 
interquartile range and normality determined by D’Agostino-Pearson, 
Shapiro-Wilk and Kolmogorov-Smirnov normality tests (P < 0.0001). 


Summary data are represented as mean + s.e.m. 
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Extended Data Figure 3 | Aggression CPP behaviour. a, Experimental middle neutral chamber during pre-test and test sessions. e, Experimental 
schematic of aggression CPP procedure. b, c, Individual duration spent schematic of sensory CPP procedure. f, g, Individual duration spent in the 
in the intruder-paired context for AGG (t7 = 3.106, *P < 0.05; two-tailed intruder-paired context for AGG (f) and NON (g). h, Duration spent in 
paired t-test, n = 8 per group) (b) and NON (ft; = 2.918, *P < 0.05; the middle neutral chamber during pre-test and test sessions. Summary 
two-tailed paired t-test, n = 8 per group) (c). d, Duration spent in the data are represented as mean + s.e.m. 
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Extended Data Figure 4 | BF-IHb circuit tracing and GABAergic 
cell-type specificity. a, Schematic of viral tracing strategy. b, Representative 
BF viral infection with AAV2-hSyn-eYFP. Scale bar: 500 1m. c, Histological 
analysis of viral infection with AAV2-hSyn-eYFP (F3 = 223.0, one- 

way ANOVA ***P <0.0001, post-hoc test, ***P <0.0001; n=3 mice, 

3 slices per mouse) across adjacent anatomical regions. d, e, Whole-cell 
electrophysiological recordings (d) and representative traces of [Hb 
neurons photostimulated with AAV2-hSyn-ChR2.0 in the absence or 
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presence of bath-applied GABAg receptor antagonist gabazine (2 1m; 
F,7= 220, one-way ANOVA P < 0.05; post-hoc test, ***P < 0.001, n=4, 
2, 2 cells from 2 mice) (e). f, Optically evoked IPSC response delay 
(n=21 oIPSC events, 2 mice). g, Representative images of eYFP3! Ib 
terminal colocalization between vesicular GABA transporter (top), and 
not vesicular glutamate transporter 1 (bottom). Scale bars: 10 j1m; white 
arrows indicate colocalization within insets. MS, medial septum; pLS, 
posterior lateral septum. Summary data are represented as mean + s.e.m. 
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Extended Data Figure 5 | Multiunit anaesthetized optrode recordings. 
a, Schematic of in vivo anaesthetized multi-unit optrode recording 
procedure (left) and representative optrode placement in |Hb (right; 

scale bar: 200m). b, c, Heatmaps of normalized firing rates for Hb 
neurons in response to BF terminal stimulation with ChR22F- IH (p) 

or NpHR33F—!"b (c) and averaged spike wave-form shown below for 
pre-stimulation, stimulation and post-stimulation epochs. d, Percentage 
of cells by firing response (top) and average normalized IHb firing rate 
(bottom) after BF-IHb terminal stimulation with ChR25?~"® for all 
identified cells (F2,134= 8.249, one-way repeated-measure ANOVA 
P<0.001; post-hoc test, *P < 0.05; n =68 cells from 3 mice) and cells that 
significantly decreased firing during the stimulation epoch (F;,195 = 8.868, 
one-way repeated-measure ANOVA P < 0.0001; post-hoc test, *P < 0.05; 
n= 16/68 cells from 3 mice). e, Percentage of cells by firing response (top) 
and average normalized 1Hb firing rate (bottom) after BF-IHb terminal 
stimulation with NpHR3°'—" for all identified cells (F2,:23 = 10.32, 
one-way repeated-measure ANOVA P < 0.0001; post-hoc test, *P < 0.05; 
n= 65/65 cells from 3 mice) and cells that significantly increased firing 
during the stimulation epoch (F7,293 = 17.58, one-way repeated-measure 
ANOVA P < 0.0001; post-hoc test, *P < 0.05; n = 30/65 cells from 3 mice). 
mHb, medial habenula. Summary data are represented as mean + s.e.m. 
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Extended Data Figure 6 | BF-IHb AAV infection and CPP locomotor 
behaviour. a, Schematic of BF coronal slice (left), alongside representative 
AAV-ChR2-eYFP (top) and AAV-NpHR3.0-eYFP (bottom) infections. 
Scale bar: 500 jim. b, Schematic of [Hb coronal slice (left), alongside 
representative images of BF terminal infection by AAV-ChR2-eYFP 
(middle top) and AAV-NpHR3.0-eYFP (middle bottom) within the IHb. 
Scale bar: 200 1m. Representative close-ups of terminal regions shown in 
insets on right. Scale bar: 501m. All representative images counterstained 
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with DAPI. ¢, d, Histological analysis of BF infection in NON (c) and AGG 
(d) mice. e, f, Histological analysis of habenular viral infection in NON (e) 
and AGG mice (f). g-j, Total distance travelled (g, h) and mean velocity 

(i, j) between NON and AGG during the CPP pre-test and test phase. All 
data are presented as mean + s.e.m., and are not significant as determined 
by two-way ANOVA, P < 0.05. dStr, dorsal striatum; mHb, medial 
habenula; MS, medial septum; pLS, posterior lateral septum. 
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Extended Data Figure 7 | Direct Hb stimulation bi-directionally 
modulates aggression reward. a, Schematic of viral infection strategy. 

b, c, Representative images of IHb cell body infection in NON (b) and 
AGG (c). Scale bar: 200 1m. d, Histological analysis of [Hb viral infection. 
e, Representative CPP traces of NON. NON::NpHR"*® cell body infection 


mimics the physiological effect of NON::ChR2®*~? terminal stimulation. 


f, Normalized CPP score (t)5 = 2.834, *P < 0.05; two-tailed unpaired t-test, 
n= 8-9 per group) and subtracted CPP score (t)5 = 3.058, **P < 0.01; 
two-tailed unpaired t-test, n = 8-9 per group) in NON::eYFP!™? and 
NON::NpHR"™®. g, Individual duration spent in the intruder-paired 
context for NON::eYFP!® (t5 = 0.9129, P > 0.05; two-tailed 
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paired t-test, 7 = 10 per group) and NON::NpHR™ (tg = 2.344, *P < 0.05; 
two-tailed paired t-test, nm = 10 per group). h, Representative CPP traces of 
AGG::eYEP™™ and AGG::ChR2", i, Normalized CPP score (t}3 = 2.692, 
*P < 0.05; two-tailed unpaired t-test, n = 9-11 per group) and 

subtracted CPP score (t)3 = 4.203, ***P < 0.01; two-tailed unpaired t-test, 
n=9-11 per group) for the intruder-paired context in AGG::eYFP™ 

and AGG::ChR2!. j, Individual duration spent in the intruder-paired 
context for AGG::e YFP mice (tio = 3.212, **P < 0.01; two-tailed paired 
t-test, n=9 per group) and AGG::ChR2™> mice (tg = 1.348, P< 0.05; 
two-tailed paired t-test, n= 11 per group). Summary data are represented 
as mean + s.e.m. dStr, dorsal striatum; mHb, medial habenula. 
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Extended Data Figure 8 | BF-IHb stimulation modulates cocaine CPP. a, Experimental timeline of general anxiety and cocaine CPP testing. 
b-e, BF-IHb stimulation during open field testing (b, c) and elevated plus maze testing (d, e). f, Subthreshold cocaine (10 mg kg, intraperitoneal) 
CPP procedure with BF-IHb stimulation during CPP test (f) = 2.403, P < 0.05; two-tailed unpaired t-test, n = 5-6 per group). 
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Extended Data Table 1 | Stress and anxiety behaviours in AGG and NON 
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The behavioural data are shown as mean +s.e.m. and analysed by unpaired Student's t-test. Significance at *P< 0.05. 
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Extended Data Table 2 | Social approach behaviours in AGG and NON 
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7.96 7.70 
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The behavioural data are shown as mean+s.e.m. and analysed by unpaired Student's t-test. Significance at *P< 0.05. The subtracted social score was derived by subtracting time in the social-paired 
chamber from the novel object-paired chamber during the test phase. Normalized social score is the ratio of time spent in the chamber of interest (social target or novel object) during the test phase 


over the pre-test phase. 
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Rates and mechanisms of bacterial mutagenesis 
from maximum -depth sequencing 


Justin Jee!*, Aviram Rasouly!”, Ilya Shamovsky’, Yonatan Akivis!, Susan R. Steinman!, Bud Mishra*s & Evgeny Nudler!*s 


In 1943, Luria and Delbriick used a phage-resistance assay to 
establish spontaneous mutation as a driving force of microbial 
diversity’. Mutation rates are still studied using such assays, 
but these can only be used to examine the small minority of 
mutations conferring survival in a particular condition. Newer 
approaches, such as long-term evolution followed by whole-genome 
sequencing”’, may be skewed by mutational ‘hot’ or ‘cold’ spots**. 
Both approaches are affected by numerous caveats*’. Here we devise 
a method, maximum-depth sequencing (MDS), to detect extremely 
rare variants in a population of cells through error-corrected, 
high-throughput sequencing. We directly measure locus-specific 
mutation rates in Escherichia coli and show that they vary across 
the genome by at least an order of magnitude. Our data suggest that 
certain types of nucleotide misincorporation occur 10*-fold more 
frequently than the basal rate of mutations, but are repaired in vivo. 
Our data also suggest specific mechanisms of antibiotic-induced 
mutagenesis, including downregulation of mismatch repair via 
oxidative stress, transcription-replication conflicts, and, in the case 
of fluoroquinolones, direct damage to DNA. 

De novo mutations in bacteria remain a notoriously difficult target 
for high-throughput sequencing. Whereas E. coli mutate fewer than 1 
in 10” bases per generation, high-fidelity polymerases used for library 
preparation polymerase chain reaction (PCR) cause errors in ~4 out of 
10° bases®. Illumina machines misread ~1 in 10% bases®. Recent meth- 
ods, such as barcoding of reads from the same original DNA molecule’, 
have lowered the error rate of sequencing. However, such methods 
can have low yields'® and do not address errors introduced by PCR. 
PCR errors can be overcome using duplex barcoding, which forms a 
consensus from both strands of a DNA template molecule!!. However, 
even when a small region is targeted'?, duplexing lowers yield even 
further. The mutational landscape of an RNA virus with mutation rate 
10*-fold greater than E. coli was recently mapped using ‘circle sequenc- 
ing. However, this technique is not designed for targeted coverage of 
a single locus, and its accuracy is limited by sequence read length’*. 

We introduce maximum-depth sequencing (MDS) for detecting 
extremely rare variants in any region of interest (ROJ) ina population 
of cells (see Methods, Fig. 1a). By synthesizing unique barcodes directly 
onto the ROI of an original genomic DNA molecule and then copying 
that molecule using linear amplification, we increase yield (Fig. 1b) 
and substantially reduce both polymerase and sequencing errors (Fig. 1c). 
On mock cultures with single-nucleotide mutants spiked in at known 
concentrations, MDS reliably recovers the expected proportion of 
mutants at the lowest frequency tested, 10~° (Extended Data Fig. 1). 
On in vitro synthesized DNA templates, MDS reduces the error rate to 
less than 5 x 10~® per nucleotide sequenced (Fig. 1c, Extended Data 
Fig. 2). By increasing the number of reads used to call a consensus 
sequence (R), MDS can lower error rate indefinitely, given sufficient 
coverage (see Methods, error rate of MDS). Application of a second 
barcode after linear PCR increases accuracy at an even sharper rate and 


was used here to demonstrate library preparation efficiency (Extended 
Data Fig. 2; Supplementary Information: testing sample preparation 
and PCR efficiency.) 

We used MDS to investigate mutation rates in MG1655 E. coli grown 
for < 120 generations. We investigated six ~100-nucleotide ROIs: 
(1) part of the coding sequence (CDS) of the 8 subunit of RNA polymerase 
(rpoB), which confers rifampicin resistance when mutated; (2) the 3’ 
untranslated region (UTR) of rpoB; (3) the RNA polymerase w subunit, 
rpoZ; (4) the CDS of cold-shock response gene cspE; (5) the centre of 
the CDS of penicillin-binding protein gene mrcA; and (6) the 3’ end of 
the CDS of mrcA. The last three genes, when knocked out, do not affect 
cell growth!*!>, Whereas rpoB, rpoZ, and cspE are highly transcribed, 
mrcA is one of the least-transcribed genes in E. coli under normal 
conditions!°. All ROIs have balanced AT and CG content, are transcribed 
on the leading strand, and lack homopolymers >8 nucleotides (nt). 

Mutation rates in E. coli have been reported from 0.2 x 107! to 
5 x 107!°nt per generation*!®!”. Our calculated rate of mutation in 
rpoB CDS using synonymous substitutions is 4.1 x 10~'°nt per gener- 
ation, comparable to the rate obtained in ref. 17 and at least one long- 
term evolution experiment using MG1655 (ref. 2). Yet it is also higher 
than rates calculated by fluctuation assay and long-term evolution on 
other strains (Fig. 2a, Extended Data Fig. 3). We performed fluctua- 
tion assays and recovered a similar spectrum and low rate of mutation 
to others using such approaches'®. It is likely that the higher rate of 
mutation in rpoB obtained with MDS indicates a rate uninfluenced 
by negative selection, phenotypic lag, or imperfect plating efficiency”. 

Mutation rate in nonessential rpoZ, and cspE, as well as rpoB UTR, 
is only slightly higher than that in essential rpoB CDS, but our calcu- 
lated rate of mutation in the middle of mrcA is 3.5 x 10~° nt per gen- 
eration, an order of magnitude higher than the observed rate in rpoB 
CDS and significantly higher than the rates of mutation in all other 
ROIs (P< 0.001 by ANOVA). The 3’ end of mrcA also has a higher 
rate of mutation than all other ROIs considered except for the middle 
of mrcA, suggesting spatial clustering of mutation rates. Comparison 
of genomes from several E. coli strains has suggested that clustered, 
highly transcribed genes are protected from mutation by an unknown 
mechanism‘, a finding that has since been challenged*!®. Our results 
demonstrate that at least one gene with low transcription rate has 
significantly higher mutation rate than three others with high tran- 
scription rate. 

The mutational spectrum from MDS matches that found in long- 
term sequencing experiments, with transition mutations favoured 
over transversions (Fig. 3a, Extended Data Figs 4, 5a). We also note 
an unexpected high frequency of C—A substitutions. These do not 
appear to be lasting mutations, as complementary G—T substitu- 
tions emerged with less than 0.1-fold frequency. A similar effect was 
found to a lesser extent for G—A and C—T substitutions. Increasing 
R did not significantly reduce these high substitution frequencies 
(Fig. 3b, Supplementary Information: model of damaged base pairs), 
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Figure 1 | Overview of MDS. a, Comparison of traditional barcoding 
protocol with MDS (see Methods for details). Note an additional barcode 
can be attached after linear amplification to further increase accuracy 
(see Extended Data Fig. 2). b, Mean yield of various methods, in consensus 
nucleotides called per nucleotides sequenced. Results from our study 

are boxed. c, Mean error rate of various methods when applied to 

DNA synthesized in vitro, in frequency of miscalled bases (log) scale). 
Error rates from our study are given using both Phusion (Phu) and Q5 
polymerase. Q5, R= 3. Analysis of 1,685,502 consensus nucleotides 
yielded no errors. The value shown is extrapolated from the Q5 error 
rate and expected reduction given R=3. Yield and error rate from 
previous methods are from ref. 10. MDS experiments were performed in 
quadruplicate. Error bars are s.d. 


suggesting that the majority of in vivo C—A substitutions are not due to 
damaged nucleotides. We found that in vitro templates synthesized with 
8-oxoguanine (8-oxoG) resulted in low C—A substitution rates 
(Extended Data Fig. 3c), and treatment of in vivo DNA with formami- 
dopyrimidine DNA glycosylase (FPG) did not change the observed 
substitution frequency (Extended Data Fig. 3c), further confirming that 
these C—A substitutions are probably not due to 8-oxoG. It is possible 
that As, or ribonucleotide As, are misincorporated into the gnome 
at C sites in vivo. We found that neighbouring Cs are predictive of a 
higher frequency of C—A substitutions, suggesting that these transient 
substitutions cluster spatially along the genome, unlike polymerase or 
sequencing errors (Fig. 3c, Extended Data Figs 3b, 4, 5b). 

In vivo, these misincorporations must be reversed before genome 
replication. However, our observations represent a snapshot of this 
dynamic process before repair can occur. Although these events would 
be invisible to conventional methods, the frequency of these substitu- 
tions, at ~10~° per nucleotide, is over 104 times more frequent than 
the true rate of mutation. 
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Figure 2 | Substitution rates and indel frequencies. a, Comparison 

of mutation rates calculated from fluctuation assays (FA) using either 
rifampicin (Rif) or nalidixic acid (Nal), long-term evolution (LTE), and 
mutation accumulation (MA). Rates calculated using MDS are boxed. Note 
that number of generations is calculated according to population doubling 
time in refs 2 and 3 (see Supplementary Information: generation time 
models). b, Frequency of indel mutations recovered at t= 120 generations. 
Values are normalized for possible indel lengths considered in each 
category. Experiments are biological quadruplicates. All error bars are 95% 
confidence intervals (CI). 


To clarify which substitutions are transient rather than involved in 
‘true mutation, we analysed DNA from bacteria collected after <20 
generations, a short enough time period to expect few true mutations, 
given our sample size (Fig. 3a). We observed enrichment for most types 
of substitutions in our <120 generation trial over our <20 generation 
control, as would be expected from true mutations. However, CA, 
A—G, and C—T substitutions occur in comparable frequency in the 
20 and 120 generation trials, suggesting these substitutions reflect a 
continual process of base misincorporation and repair. We did not 
include these abundant A and T substitutions in our calculation of 
mutation rates. However, these findings suggest that the mechanism 
underlying the increase of AT content in E. coli grown for long periods” 
is a dynamic process of misincorporation and repair. 

We calculated short (<12 base pairs (bp)) indel rates in mrcA, rpoB 
UTR, rpoZ, and cspE ROIs (Fig. 2b). Indel rate varies widely by position 
and size. As might be expected’, 100% of the observed 1-bp indels 
occur at a site adjacent to a homopolymer. The frequency of 1-bp indels 
also increases with homopolymer length, potentially explaining why 
cspE, with an 8-bp T homopolymer, has the highest 1-bp indel rate. 
Longer indels are not localized to homopolymers and are positively 
correlated with substitution rates across all ROIs (Extended Data Fig. 6), 
supporting previous work suggesting that indels and substitutions 
spatially cluster in comparisons of genomes from divergent bacterial 
species”’. In all ROIs, deletions were detected at >10-fold frequency 
of insertions. 

Single nucleotide indels and longer frameshifting mutations were 
also observed in rpoB CDS, albeit at low frequency, even though such 
mutations should be deleterious. As expected, the rate of in-frame 
indels was higher than the rate of frameshift indels of >1-bp length 
(Fig. 2b). Because of the low rate of indel errors from in vitro polymer- 
ases used here’®, it is plausible that the observed frameshift mutations 
are from inviable bacteria, as DNA from such cells may still enter our 
protocol. The recovery of frameshift indels, as well as the nonsignif- 
icant difference between rates of synonymous and nonsynonymous 
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Figure 3 | Substitution spectra. a, Frequency of base substitutions 
recovered in our sequencing protocol at t= 20 generations and 

t = 120 generations in rpoB CDS. Values are not normalized by number of 
generations and thus are true frequencies, not mutation rates. Experiments 
are biological quadruplicates. b, The high frequency of C—A substitutions 
is consistent even as R increases. If these substitutions were polymerase 
errors due to damaged nucleotides, they should decline with increasing 

R faster than the line representing a model in which the polymerase makes 
C—A errors with 50% frequency for a subpopulation of DNA molecules 
(see Supplementary Information: model of damaged base pairs). c, C>A 
substitutions in vivo cluster in nucleotides with at least two neighbouring 
Cs within a 2-bp radius (P< 0.01 by t-test), unlike polymerase errors. 
Error bars are 95% CI upper bound. 


substitutions in rpoB CDS (Supplementary Table 3), demonstrate that 
selection in our protocol is minimal. 

Exposing E. coli to sub-inhibitory doses of multiple classes of 
antibiotics increases the rate at which bacteria acquire resistance to 
rifampicin. Whether this increase is caused by nucleotide oxidation”!2, 
downregulation of mismatch repair”, or an unrelated pathway”, has 
become a topic of interest. We investigated the effect of sub-inhibitory 
doses of ampicillin and norfloxacin—a ( lactam and fluoroquinolone 
respectively—on mutation rate using MDS of rpoB CDS and mrcA, 
as well as detailed fluctuation assays'®”° (Fig. 4a). Addition of ampi- 
cillin increased the rate of transition mutations in rpoB, a signature 
indicative of downregulated mismatch repair*. In cells overexpressing 
catalase, basal mutation rate decreased by a factor of 8 (Fig. 4b), indi- 
cating that background oxidation contributes significantly to the basal 
mutation rate under non-stressed conditions. Addition of ampicillin 
during catalase overexpression did not increase this low rate (Fig. 4b). 
Overexpression of a catalase with inactivating point mutation H106Y 
did not confer similar mutagenic protection (Extended Data Fig. 7). 
These results together support a model in which ampicillin causes 
oxidative stress”), which acts upstream of downregulation of mismatch 
repair” to increase mutation rate. Consistently, cells grown in anaerobic 
conditions did not display an increase in transition rate when challenged 
with ampicillin (Extended Data Fig. 8a). The same was true in aerobic 
conditions if mismatch repair gene mutS was knocked out (Extended 
Data Fig. 8b; see Supplementary Information for further discussion). 

In contrast, exposure to norfloxacin increased the rate of >1-bp indel 
formation in both mrcA and rpoB (Fig. 4a). Norfloxacin inhibits DNA 
gyrase and can cause double-strand breaks in DNA”. This physical 
interaction thus directly causes antibiotic-induced mutagenesis in 
norfloxacin-treated cells. 

There is debate as to whether highly transcribed genes in bacteria 
have a higher!*”’ or lower* mutation rate than other genes. Our anal- 
ysis in E. coli shows that mrcA has a higher basal rate of mutation than 
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Figure 4 | Relationships between mutation rates and physiologic 
conditions. a, Fold change in transversion (Tv), transition (Ts), and 

indel rate in response to ampicillin or norfloxacin according to MDS (for 
fluctuation assay results and raw substitution rates see Extended Data 
Fig. 9). b, Fold change in mutation rate in a strain overexpressing catalase 
(katG). ¢, Fold change in mutation rate of mrcA in response to induction 
via IPTG promoter. Experiments are biological quadruplicates. Error bars 
are 95% CI. SD, Shine-Dalgarno sequence. 


more highly transcribed genes. Yet interestingly, addition of ampicillin 
increased transversions and indel formation in mrcA, but not in 
rpoB CDS (Fig. 4a). It is known that mrcA undergoes mild induction 
upon addition of ampicillin’®. To study the effect of transcription on 
mutagenesis further, we created a strain in which a chromosomal copy 
of mrcA is regulated by an isopropyl 3-p-1-thiogalactopyranoside 
(IPTG) promoter. Induction of mrcA transcription increased the 
frequency of all classes of mrcA substitution and indel ~8-fold more than 
when wild-type cells were exposed to ampicillin (Fig. 4c). These results 
suggest that although, in basal conditions, cells may have a means of 
protecting the most highly transcribed genes, co-directional collisions 
between transcription and replication machinery, which can cause 
double-strand breaks”’, are themselves mutagenic. Induction itself may 
thus be an important mechanism of stress-induced mutagenesis*”. 

The low translation rate of mrcA, coupled with our finding that rpoB 
UTR has a higher rate of mutation than the CDS, suggests that transla- 
tion may be protective for highly transcribed genes. We constructed an 
additional strain in which IPTG-regulated mrcA has a canonical Shine- 
Dalgarno sequence and start codon, rather than its low-translation 
endogenous sequence. Increasing translation decreased substitution 
rate in the IPTG-induced state by a factor of 50% and a factor of 75% 
when high-frequency (C—A, for example) substitutions are excluded 
(Fig. 4c). Although translation does not lower the mrcA mutation 
rate to rpoB levels, it probably contributes to protection of highly 
transcribed genes (Supplementary Information: relationship between 
transcription, translation, and mutation rate). 

Straightforward extensions to MDS would allow for analysis of many 
ROIs simultaneously and assembly of longer ROIs (Supplementary 
Information: MDS protocol). MDS may also be useful in detection of 
genetic abnormalities in cell-free DNA due to foetal mutations or cancer. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
Maximum-depth sequencing. First, genomic DNA is treated with a restriction 
enzyme, which cleaves at the 3/ end of the ROI. A single PCR cycle is performed 
with barcoded primers annealing to the 3’ end of the ROI. Because of the exposed 
3’ site on the genomic DNA molecule left by the restriction enzyme, the genomic 
DNA molecule acts as a ‘primer’, causing the barcode and an adaptor to be 
synthesized onto the end of the ROI. This synthesis effectively attaches the barcode 
to the original genomic DNA molecule. Unused barcoded primers are removed, 
and N cycles of linear amplification are performed using only primers to the for- 
ward adaptor sequence. This step is important for screening polymerase errors. 
The polymerase may make an error in any single round of synthesis, increasing 
the probability of generating a faulty read by N, but by copying the same original 
DNA molecule multiple times, the probability of recovering a defective copy after 
analysis is reduced by a factor of N®, where R is the number of independent reads 
used to build a consensus sequence. Thus the total error reduction is 1/N®“! fold 
(see below). In this study, typically N= 12 and R > 3, although the empiric value 
of N after accounting for inefficiencies in PCR is somewhat lower (see Extended 
Data Fig. 2 and Supplementary Information: testing sample preparation and PCR 
efficiency). Note that one could also attach a second barcode to each read after 
linear amplification but before exponential amplification—doing so could allow 
one to reduce the error rate even further by ensuring multiple reads from the 
linear amplification step are used in the analysis. By targeting a ROL, we can also 
use paired-end sequencing to increase yield. Detailed error rate spectra for both 
Phusion and Q5 polymerase are measured and reported in Supplementary Table 1. 
It should be noted that when R> 2, MDS errors such as those shown in Fig. 1 
are derived almost entirely from transition substitutions typical of PCR polymer- 
ases, and that for other kinds of substitutions, error rate is virtually nonexistent. 
In MDS, each read represents additional 1 x coverage of the ROI. Thus, MDS 
can achieve ~10°-fold coverage using an Illumina HiSeq machine. For details 
on the specific enzymes, primers, and PCR conditions used in this study see 
Supplementary Information: MDS protocol. For details on consensus base calling, 
see Supplementary Information: analysis. 

Error rate of MDS. Sources of error include damaged DNA during extraction, 
polymerase errors during PCR, and sequencing errors. Because our goal is to 
identify rare mutants, we consider error as the rate of false positives, which affect 
mutant frequency to a much larger extent than false negatives’. 

If the probability of a single nucleotide X being misread as Y owing to polymer- 
ase error is Ppol,x + y and the rate of the corresponding sequencing error is Poeg,x—Y> 
then the probability that X will be read as Y owing to either source of error in a 
standard sequencing protocol is 


Pxsy = Ppol,xy + Preg.xY (1) 


As discussed briefly in the main text, in our assay, the total polymerase error 
rate Epox —.y can be derived as follows (for visual aid, see Extended Data Fig. 10). 
For convenience, Pyoi,x—.y Will hereafter be referred to as p. After exponential PCR, 
there are N pools of reads, each derived from one of the original linear amplifica- 
tion steps. The probability of having k pools derive from an original polymerase 
error is binomially distributed. Furthermore, because Np << 1, the distribution is 
Poisson. 


(iP (1 — p)N-# we ANP a) (2) 


The probability of a false positive is the probability that all R reads used to form 
a consensus came from one of the k ‘error’ pools 


Se Ne) eke. Seg ay 1 Br(Np) 
—Np — k® —Np — Mp = 
> [F Mo ye a neo RNR 


(3) 


Where Mg is the Rth moment of the Poisson distribution in equation (2) and Br 
is the Rth Bell polynomial. Because Np < 1, an upper bound on this error formula 
can be written as follows: 


0.792R 


Br(Np) 7 PBR) op 
In(R+ 1) 


NR NER- 1 NF-1 (4) 


Epol,x y= 


Where the upper bound of the Rth Bell number Ba(1) is from ref. 31. These bounds 
will decrease rapidly as R increases, given that R<N. 

We note that in practice, the probability that the same error would emerge in 
k>1 reads produced by the linear amplification step is ~10~', so low that the 
expected number of such multi-errors for all the nucleotides sequenced in this 
study is <1. With this in mind, it is possible to simplify equation (4) so that that the 
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Bell number term is a non-contributor to the total error. Under this assumption, 
the probability of false positive is 


Ppol,X > 
pol,X—Y ( 5) 


The above formula for Epoi,x-.y only takes into account errors introduced 
during linear amplification. However, the maximum error that could be contrib- 
uted during a subsequent round of doubling, or exponential, PCR (D) can be found 
by substituting N2? for N in the equation above. The sum ofall possible errors from 
all rounds of PCR would thus be 


Pool, XY 
~ pol,X > 
Evotal pol X-Y 4 y (N2D)R-1 (6) 
For R= 2, this will be a geometric series with sum no greater than 2Epo1,x-.y. For 
R>2, the sum will be closer to Epoi,x—y- 
The error rate of sequencing after forming a barcode, as discussed thoroughly 


in other texts!” is the probability that the same error happens R times 


Eseq,XY = (Begx—v)* (7) 
Where R is the number of ‘not necessarily independent’ reads used to form a 
consensus (that is, overlapping paired-end sequences of the same read are 
included). If single-end sequencing is used, R = R. If paired-end sequencing is 
used, a maximum of R = 2R not necessarily independent reads are used. 

Alternatively, one could estimate Eseq,x-,y based on the sum of the quality scores 
of the R reads contributing to the consensus, but in practice we find this to be 
unnecessary because sequencing errors are not the major contributor to overall 
error when R > 2. 

The total error rate for any given nucleotide position is the sum of all 
Ex_.y,X # Y, for a given X. The values reported in the main text and Fig. 1c are 
total error. Raw polymerase and sequencing error rates? are shown in 
Supplementary Table 1. Note that this model is also the basis for the damaged 
base-pair analysis presented in Fig. 3b and the Supplementary Information. 
Growth and mutation rate analysis. E. coli were streaked onto Luria-Bertani 
(LB) agar from freezer stocks and grown at 30°C for 24h. According to plating 
and colony-forming unit (c.f.u.) counting, the average number of cells in such 
colonies is 3 x 10° (thus the number of generations is In(3 x 10°) = 19.5. Bacteria 
from a single colony were used to inoculate a small liquid culture (1 ml LB broth in 
a round-bottom tube). For the purposes of generation counting, it is assumed that 
after the transition to growing in liquid, growth occurs for only ~3 generations. 
The culture was grown in a 37°C shaker to allow for the transition to growth in 
broth for 12h, after which a measurable optical density could be reliably detected. 

4] (~107 bacteria) were transferred to a fresh 100 ml LB liquid culture 
(in a 250 ml Erlenmeyer flask). Liquid cultures were grown for 24 h on a 37°C 
shaker, to a density of 2.5 x 10° bacteria according to cell counts (for a total of 
2.5 x 10! bacteria). This process was repeated 9 times. The average number of 
generations a bacterium would have grown in each liquid culture is 


11 
in| 25.X10 
10’ 


| = 10.1 generations (8) 


Thus the average total number of generations gis 19.5 + 3 + 9x 10.1=113. 

In addition to the large passage size, we stop passaging hundreds of genera- 
tions before selective sweeps are expected to occur* and, importantly, long before 
selection for a hyper-mutating strain might be expected®. We also performed 
simulations to test the effects the probability that any two bacteria have the same 
founder given expected conditions of passage size (see Supplementary Information: 
calculation of mutation rate). 

Mutation rates /. in our assay are chosen to maximize the likelihood of recov- 
ering the mean mutant frequency f for substitutions of a given type XY, which 
we find are well approximated by a Poisson process over a certain number of 
generations (in this case 113). 


fy y= (9) 


More precisely, the frequency is defined as the number of barcode groups with 
a given mutation divided by the total number of barcode groups under consider- 
ation. For example, if R> 3, then f(Y) is the number of read families of size > 3 
with mutation Y divided by the total number of read families of size > 3. Mutation 
rates given in Fig. 2 are computed from the average across all Xof S* juy_,y> 

YVY2X 

with CA, G—T, C-—T, and G—A substitutions, excluded for aforementioned 
reasons. In their place, a correction term (the average transversion or transition 
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rate based on all other substitutions) is used so that the mutation rate is not 
systematically underestimated. 

Four biological replicates of each condition were grown. All liquid cultures, 
including the small founding culture, had the possible addition of 1 ,1m1~! ampi- 
cillin or 15ngml~! norfloxacin. Cultures for the short-term growth assay and 
mock culture were grown similarly except without passaging (Supplementary 
Information: mock culture and short growth assay). 

Strains. MG1655 E. coli were used as wild-type cells for all experiments. The 
IPTG-regulated mrcA strain MG1655 and IPTG-regulated strain with modi- 
fied Shine-Dalgarno were recombineered according to ref. 33. For details, see 
Supplementary Information: strains. For details on the catalase overexpression 
mutant and inactive H106Y catalase overexpression mutant see ref. 22. In the 
mutS knockout strain, MG1655 mutS was replaced with a kanamycin resistance 
cassette. 

Preparation of DNA samples 

Genomic DNA. Up to 5 ml of bacterial liquid culture were spun down (see 
later section for specific growth conditions). Cells were resuspended in 500 ll 
Tris-EDTA buffer (pH 7.5), and 1,000 units of Ready-Lyse (Epicentre) added; 
before incubation at room temperature for 1 h and freezing at —80 °C overnight. 
Genomic DNA extraction was performed using Qiagen genomic tip (100G), 
but without lysozyme and quantified using Nanodrop. 

In vitro DNA. Single-stranded oligonucleotides with sequences corresponding 
to MG1655 rpoB at position 1511-1632 and mrcA at 1258-1379 were ordered 
from IDT and resuspended in deionized water. These oligonucleotides were used 
directly as input to the Extreme-depth sequencing protocol above for calculation 
of error rate in Fig. 1 and the ‘negative control’ rows in Supplementary Table 1. 
Note, as expected from quality control reports from IDT, we found a large number 
of indels in the in vitro synthesized templates (~1% of molecules had some type of 
indel). However, the fact that we recovered a low substitution rate could be used 
to confirm the chemical purity of the mononucleotide pools used for synthesis 
by IDT. 

Separately, 10 ng of the same DNA oligonucleotides were used as templates 
for a standard 20-cycle exponential PCR reaction with only the ROI-annealing 
component of the forward and reverse primers above using either Q5 or Phusion 
polymerase. The amplified DNA was used as input into the MDS protocol and 
used to calculate the intrinsic substitution error rate of those two polymerases as 
reported in Supplementary Table 1. 

Sequencing depth. On average, we divide single HiSeq Rapid Runs of ~240M 
reads into four different ‘conditions, each corresponding to a particular ROI from 


bacteria grown under a certain condition. The ~60 M reads of each condition are 
further subdivided in order to process triplicate or quadruplicate trials. 

We recover ~2.5 M total barcode ‘families’ for each condition using our threshold 
of R>3 (for the purposes of calculating total yield, we divide by 2 since each 
read is pair-end sequenced). We examine ~100 bp per ROL, thus providing a 
significant pool from which to observe mutations. There is an interesting level of 
variability across quadruplicates, likely due to stochastic variation when combining 
and purifying DNA samples and in binding to the HiSeq flowcell itself (Extended 
Data Fig. 1b, c). Note that when mutation frequencies are averaged over multiple 
trials, each trial is weighted according to its relative representation in terms of 
number of families. 

Fluctuation assays. Fluctuation assays were carried out as in ref. 16. We picked 
single colonies of E. coli as above and grew them in 1 ml Luria-Bertani (LB) broth 
overnight. 0.111 from this starter culture was used to inoculate 25 separate trial 
cultures. Each trial culture was grown (in a 37°C shaker) to either an optical 
density (OD¢00) of 0.3 (for exponential growth trials) in 2 ml LB broth or for 24h 
(for saturation) in 0.2 ml LB broth and plated cultures on Petri dishes containing LB 
agar with 100 mg ml! rifampicin. Colonies were grown for 48h in 30°C and c.f.u. 
were counted. The rpoB region conferring rifampicin resistance was sequenced 
and used to compute the mutational profiles in Fig. 4. Number of bacteria per 
culture was calculated by serial dilution, plating on LB agar, and counting c.f.u. 
after 48h growth in 37°C. Mutation rates and 95% CIs were computed using the 
Ma-Sandri-Sarkar method as implemented in ref. 25. Broth was possibly supple- 
mented 1 jl m1~! ampicillin, 15ng m1“! norfloxacin, or 250ngml! gentamycin. LB 
broth was placed in an LS-580 anaerobe chamber (Anaerobe Systems) overnight 
to yield anaerobic media. 

Availability. Raw sequence data are available from Sequence Read Archive 
(SRA301985). Code is available from https://github.com/justinjee/MDS and 
https://github.com/susinmotion/barcode_tries. 
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Extended Data Figure 1 | MDS accuracy and yield. a, Mock culture families) per trial versus size of barcode family in reads (R). b, Trials used 
composed of rpoB point mutants of known concentration was sequenced for the calibration run shown in a (~100 M reads total, divided into four 
using MDS. Output concentrations of each point mutant recovered trials). c, Representative quadruplicate trials (from rpoB of wild-type 
from R=2 analysis are plotted against its input concentration (see bacteria grown in LB broth with no antibiotics) taking up a total of one 
Supplementary Information Table 2 for details). b, c, Distribution of the quarter of the output of a HiSeq rapid run, a total of ~60 M reads. 


sizes of barcode families in four trials, shown as log)(number of barcode 
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Extended Data Figure 2 | Dual-barcode MDS. a, Barcodes are 

attached to original DNA molecules as per MDS protocol. After linear 
amplification, a second barcode is attached to the opposite end of each 
read (see Supplementary Information: testing sample preparation and PCR 
efficiency). Exponential PCR is then performed. In the analysis phase, 
reads can be grouped both by primary barcode (that is, a classic MDS 
barcode family) and a second barcode corresponding to a ‘subfamily’ of 
reads with the same parent from a particular linear amplification step 


before exponential amplification. b, The probability that for a given family 
only reads of one subfamily are recovered (a ‘homogenous’ barcode) 
decreases exponentially with R. For example, for R= 3, the probability all 
3 reads are of the same subfamily is 0.02. c, We show the number of reads 
in each subfamily, sorted within each column by subfamily size, for the 
1,500 largest primary barcode families in the experiment. For families of 
such size, it is unlikely that a single subfamily will account for more than 
25% of the total number of reads recovered from that family. 
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Extended Data Figure 3 | Substitution frequency controls. to in vitro DNA and in FPG-treated samples. Frequencies are reported 
a, Empirically, average substitution frequency (with high frequency from ROI positions with potential 8-oxoG incorporations as described in 
substitutions such as CA excluded) stabilizes as R increases. Note, template ‘rpoB_reverse_complement_8-oxo-Dg’. Frequencies are reported 
substitution frequencies are not normalized by number of generations. at R=2 level. For R> 2, no C—A substitutions were found in 72,646 
b, Empirical sequencing C—A error rate at CA mutational hotspots in vitro template sites. Data represent biological triplicates. Error bars are 
with neighbouring Cs (same as those in Fig. 3c) versus all other positions. standard deviation. 


c, C—A substitution frequencies when 10% 8-oxoG is synthetically added 
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a rpoB CDS in vivo 


3 
GCTGTCTCAGTTTATGGACCAGAACAACCCGCTGTCTGAGAT TACGCACAAACGTCGTATCTCCGCACTCGGCCCAGGCGGTC 


b rpoB CDS in vitro 


““@CTGTCTCAGTT TATGGACCAGAACAACCCGCTGTCTGAGATTACGC ACAAACGTCG TATCTCCGCACTCGGCCCAGGCGGTC 


Cc mrcA in vivo 


7 " eo . = adele 
“0.5 | I 
= 


GCCGGAAGTGAACTCGGCGCTGGTGTCGATCAATCCGCAAAACGRTGCCGT TATGGCGCTGGTCGGTGGCTTTGATTTCAATCAGAGCAAGTTTAACC 


d mrcA in vitro 


a3 


en 


“TaccaGAAGTGAACTCGGCGCTGGTGTCGATCAATCCGCAAAACGGTGCCGTTATGGCGCTGGTCGGTAGCTTTGATTICAATCAGAGCAAGTTTAACC 


Extended Data Figure 4 | Substitution rates per locus. Positive frequencies denote synonymous substitutions. Negative frequencies denote 


nonsynonymous substitutions. a, c, Values are averaged across quadruplicate trials. b, d, In vitro synthesized DNA has undergone 20-cycle PCR 
amplification using Q5 polymerase. 
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Extended Data Figure 5 | Mutational spectra and contexts. 

a, Substitution frequencies of all ROIs after ~120 generations of growth. 
Note that values are not normalized for the number of generations and are 
thus true frequencies, rather than mutation rates. b, Mutation frequencies 
are shown in context of their 5’ (A, C, G, or T on the x axis) and 

3! (A, C, G, or T on the y axis) neighbours. c, The relative relationship 
between in vivo substitution frequencies and expected errors due to 


sequencing and PCR (from in vitro DNA assays) is poorly described 

by a linear approximation (R? = 0.27). Furthermore, the recovered 
frequency from in vivo substitutions (R = 3) is higher than the rate of error 
(equivalent frequencies would be represented by the dotted line), even 
with the relatively relaxed read-cutoff threshold of R= 2 (the sequencing + 
PCR error with an R= 3 cutoff is approximately an order of magnitude 
lower). Templates are rpoB CDS and mrcA ROIs. 
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Extended Data Figure 6 | Comparing substitution rate and indel rate across 5 ROIs reveals a positive correlation. Pearson correlation 
coefficient = 0.76. 
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Extended Data Figure 7 | Rate of rifampicin resistance per generation. 
a-d, As calculated in fluctuation assays in wild-type cells grown in 
exponential phase only (a), wild-type cells grown to saturation (b), katG 
overexpression mutant grown to saturation (c) and inactive katG (H106Y 
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point mutation) overexpression mutant grown to saturation (d). Growth 
in LB broth was supplemented with possible subinhibitory doses of 
ampicillin (amp), norfloxacin (nor), or gentamycin (gen). Rates are mean. 
Error bars are 95% CI. N= 25 (see Methods: fluctuation assays). 
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Extended Data Figure 8 | Transversion and transition rates (per nucleotide-generation). As calculated in fluctuation assays in anaerobic conditions 
(a) and in a mutS knockout (b). Note that because the transition (Ts) rate was high in Muts strains, transversion mutations could not be detected. Rates 
are mean. Error bars are 95% CI. N= 25 (see Methods: fluctuation assays). 
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Extended Data Figure 9 | Rates of rpoB and mrcA substitutions in the rpoB in fluctuation assays, as expected since frameshift indels would be 
presence of antibiotics as calculated by MDS. Asterisks indicate cultures _ deleterious. These increased in frequency by a factor of 10 on addition of 
grown separately and prepared with Phusion rather than Q5. Although norfloxacin. 


not shown, we note that only in-frame (3 x) indels were observed in 
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Extended Data Figure 10 | Schematic depicting the mathematical 
derivation of the false positive rate of MDS due to polymerase error. 

a, The origin of various terms used in equations (2)-(7). b, Illustration of 
an example calculation of false positive rate given more ‘intuitive’ values 
of N, Rand P. The false positive rate is calculated in a way that accounts 
for the possibility that an error in one or more ‘linear’ cycles propagates to 


a whole family of reads. The number of reads with an error (k) is Poisson 
distributed according to equation (2). The probability of a false positive is 
the sum of the probabilities that all R reads come from one of k families, 
for all possible k, according to equation (3). Note that in practice, P< 10~°, 
and in our study N= 12, R>2, making the false positive rate much lower 
(see Fig. 1). 
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Host-mediated sugar oxidation promotes 
post-antibiotic pathogen expansion 
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Sean-Paul Nuccio!, Tamding Wanedi!, Oliver Fiehn?*, Renée M. Tsolis! & Andreas J. Baumler! 


Changes in the gut microbiota may underpin many human diseases, 
but the mechanisms that are responsible for altering microbial 
communities remain poorly understood. Antibiotic usage elevates 
the risk of contracting gastroenteritis caused by Salmonella 
enterica serovars', increases the duration for which patients shed 
the pathogen in their faeces, and may on occasion produce a 
bacteriologic and symptomatic relapse”*. These antibiotic-induced 
changes in the gut microbiota can be studied in mice, in which the 
disruption of a balanced microbial community by treatment with 
the antibiotic streptomycin leads to an expansion of S. enterica 
serovars in the large bowel*. However, the mechanisms by which 
streptomycin treatment drives an expansion of S. enterica serovars 
are not fully resolved. Here we show that host-mediated oxidation 
of galactose and glucose promotes post-antibiotic expansion of 
S. enterica serovar Typhimurium (S. Typhimurium). By elevating 
expression of the gene encoding inducible nitric oxide synthase 
(iNOS) in the caecal mucosa, streptomycin treatment increased 
post-antibiotic availability of the oxidation products galactarate and 
glucarate in the murine caecum. S. Typhimurium used galactarate 
and glucarate within the gut lumen of streptomycin pre-treated 
mice, and genetic ablation of the respective catabolic pathways 
reduced S. Typhimurium competitiveness. Our results identify 
host-mediated oxidation of carbohydrates in the gut as a mechanism 
for post-antibiotic pathogen expansion. 

A recent in silico analysis suggests that pathways involved in galac- 
tarate uptake and catabolism are associated with S. enterica serovars 
that cause gastrointestinal disease®. Galactarate fermentation is one of 
the biochemical reactions used to differentiate members of the genus 
Salmonella into serovars. Although 98.2% of serovars associated with 
gastrointestinal infections can ferment this carbon source, only 15.4% 
of serovars associated with extraintestinal disease test positive for this 
reaction® (Extended Data Fig. 1a). However, the biological significance 
of this association is not clear, because galactarate is a xenobiotic that 
is not normally produced by mammals or expected to be present 
within the diet. We therefore investigated the origin of galactarate in 
the intestine. 

Consistent with the idea that galactarate is a xenobiotic, the con- 
centration of this sugar in mouse chow was very low, as suggested 
by gas chromatography/mass spectrometry (GC/MS) measurements 
(Extended Data Fig. 1b). To investigate whether this nutrient is 
normally available to promote growth in mucus, we constructed a 
S. Typhimurium strain lacking the gudT ygcY gudD STM2959 operon 
(gudT-STM2959 mutant, Extended Data Fig. 1c), which encodes 
proteins involved in galactarate uptake and catabolism’. Expression of 
the gudT ygcY gudD STM2959 operon in S. Typhimurium is induced 
by hydrogen, a fermentation product of the gut microbiota®. Deletion 
of galactarate utilization genes rendered S. Typhimurium unable 
to ferment galactarate and glucarate, but did not affect its ability 
to utilize other monosaccharides (Fig. 1a). Genetic ablation of the 


galactarate/glucarate utilization genes did not reduce the fitness of 
S. Typhimurium for anaerobic growth on hog mucin as the sole 
carbon source, but fitness of the gudT-STM2959 mutant was reduced 
compared to the wild type when galactarate or glucarate was added to 
the medium (Fig. 1b). These data suggested that neither the diet nor 
the mucus naturally contained biologically relevant quantities of a sub- 
strate for enzymes encoded by the gudT ygcY gudD STM2959 operon. 

We next investigated the contribution of the gudT ygcY gudD 
STM2959 operon to post-antibiotic pathogen expansion. Treatment 
of mice with a single dose of streptomycin one day before infection 
(pre-treatment with streptomycin) increased recovery of the wild-type 
S. Typhimurium from the colon contents of mice by approximately 
one order of magnitude compared to animals that had not received 
antibiotics (P < 0.05) (Fig. 1c). Genetic ablation of the galactarate/ 
glucarate utilization genes significantly (P < 0.05) reduced recovery 
of S. Typhimurium from streptomycin pre-treated mice, but not from 
mice that had not received antibiotics. Genetic complementation with 
a plasmid carrying the cloned gudT ygcY gudD STM2959 genes restored 
recovery of the gudT-STM2959 mutant from streptomycin pre-treated 
mice to levels observed with the wild-type S. Typhimurium. Collectively, 
these data provided genetic evidence for a contribution of the gudT ygcY 
gudD STM2959 operon to post-antibiotic expansion of S. Typhimurium. 

Preconditioning of mice with streptomycin increases the severity of 
S. Typhimurium induced colitis’. We therefore investigated whether 
the availability of galactarate and/or glucarate is elevated during severe 
colitis, a host response triggered by the action of two type III secretion 
systems (T3SS-1 and T3SS-2), which constitute the main virulence 
factors of S. Typhimurium®"”. To prevent the generation of acute intes- 
tinal inflammation, we used avirulent S. Typhimurium strains lacking 
a functional T3SS-1 (due to a mutation in invA) and T3SS-2 (due toa 
mutation in spiB). Streptomycin pre-treated mice were infected either 
with a 1:1 mixture of the wild-type bacteria and a gudT-STM2959 
mutant or with a 1:1 mixture of an invA spiB mutant and an invA spiB 
gudT-STM2959 mutant. In each competition, the galactarate/glucarate 
utilization-proficient strain (the wild-type bacteria or the invA spiB 
mutant) was recovered in higher numbers than the corresponding 
galactarate/glucarate utilization-deficient strain (the gudT-STM2959 
mutant or the invA spiB gudT-STM2959 mutant) (Fig. 2a). However, 
only mice infected with a mixture of the wild-type bacteria anda gudT- 
STM2959 mutant developed acute intestinal inflammation (Extended 
Data Fig. 2 and Extended Data Tables 1 and 2). When the experiment 
was repeated with mice that had not received streptomycin, the presence 
of genes for galactarate/glucarate utilization no longer conferred a 
fitness advantage (Fig. 2a). To distinguish between glucarate and 
galactarate as possible carbon sources, we inactivated gudD, encoding 
glucarate dehydratase, or garD, encoding galactarate dehydratase 
(Extended Data Fig. 1c). During in vitro growth, genetic ablation of 
gudD only reduced S. Typhimurium fitness in medium containing 
glucarate, while deletion of the garD gene only reduced fitness in 
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Figure 1 | The operon for galactarate utilization contributes to post- 
antibiotic expansion of S. Typhimurium. a, The ability of the indicated 
S. Typhimurium strains to ferment the indicated monosaccharides was 
detected using a pH indicator. Fermentation of the sugar is indicated 

by a colour change from blue to yellow. b, Minimal medium or mucin 
broth supplemented with the indicated carbon sources (0.1% w/v) was 
inoculated with a 1:1 mixture of the wild-type (WT) S. Typhimurium 

and the gudT-STM2959 mutant. Competitive index recovered after 24h 
incubation in an anaerobic chamber. Experiments were performed with 3 
biological replicates (a, b). c, Groups of streptomycin pre-treated or mock- 


medium containing galactarate (Extended Data Fig. 1d). Both the 
garD gene and the gudD gene conferred a fitness advantage in strep- 
tomycin pre-treated mice (Extended Data Fig. le). Collectively, these 
data suggested that streptomycin treatment increased the availability 
of both glucarate and galactarate through a mechanism that was 
streptomycin-dependent, but independent of acute colitis triggered by 
S. Typhimurium virulence factors. 

Antibiotic treatment increases the availability of sialic acid and fucose 
in the large intestine. It has been proposed that these monosaccharides 
are liberated by the resident microbiota from complex carbohydrates, 
a conclusion based on the observation that sialic acid and fucose are 
absent from caecal contents of germ-free mice'!. We thus investigated 
the possibility that the gut microbiota might play a role in liberating 
galactarate from complex carbohydrates in the intestine. Conventional 
mice (non-germ-free with a normal microbiota) received either strepto- 
mycin or vehicle control by oral gavage and the concentrations of galac- 
tarate and glucarate were measured in caecal contents by GC/MS four 
days later (Extended Data Fig. 3a, b). The concentration of galactarate 
was low in mice that had not received antibiotics, and streptomycin 
treatment resulted in a marked increase in the amount of galactarate 


: 
Streptomycin No antibiotic 


treated (no antibiotic) mice (C57BL/6) received the indicated 

S. Typhimurium strains by oral gavage and bacteria were recovered 

from the colon contents 4 days later. CFU, colony-forming units. 

Grey shading indicates the average colonization levels of the wild-type 

S. Typhimurium in mice that had not received antibiotics. Wild-type, 
IR715; gudT-STM2959 mutant, FF162; complemented mutant, 
FF162(pGUDT); n indicates the number of individual mice. Bars represent 
geometric means + s.e.m (b, c). A Student's t-test was applied to determine 
statistical significance. NS, not statistically significant. 


present in the caecum (P < 0.001) (Fig. 2b). Similarly, streptomycin 
treatment increased (P< 0.001) caecal glucarate concentrations 
(Fig. 2c). We reasoned that if galactarate and glucarate present in 
streptomycin-treated mice was microbiota-liberated, then the concen- 
trations of these sugars should be markedly reduced or absent in germ- 
free animals. Surprisingly, galactarate and glucarate levels measured 
by GC/MS in caecal contents of germ-free mice were similar or higher 
than those detected in conventional mice pre-treated with streptomycin 
(Fig. 2b, c). These data ruled out microbiota liberation as a possible 
mechanism by which streptomycin treatment elevated the availability 
of galactarate and glucarate in the murine large intestine. 
Streptomycin treatment of mice induces elevated caecal mucosal 
transcript levels of Nos2, the gene encoding inducible nitric oxide 
synthase (iNOS), through an unknown mechanism!*(Extended Data 
Fig. 4a). To determine whether the luminal fitness advantage conferred 
by galactarate/glucarate utilization genes required Nos2 expression, 
streptomycin pre-treated Nos2-deficient mice were infected either with 
a 1:1 mixture of the wild-type S. Typhimurium and a gudT-STM2959 
mutant or with a 1:1 mixture of an invA spiB mutant and an invA spiB 
gudT-STM2959 mutant. Remarkably, in each experiment, the luminal 
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errors (a-g). A Student's t-test was used to determine 
statistical significance. 


fitness advantage conferred by the galactarate/glucarate utilization 
genes after streptomycin treatment was abrogated in Nos2-deficient 
mice (Fig. 2a). These data suggested that the Nos2 gene was necessary 
to generate a substrate for enzymes encoded by the gudT ygcY gudD 
STM2959 operon. 

To determine whether the streptomycin-induced increase in the 
caecal galactarate and glucarate concentrations (Fig. 2b, c) was Nos2- 
dependent, we measured galactarate and glucarate concentrations in 
caecal contents from Nos2-deficient mice four days after streptomycin 
treatment by GC/MS. Strikingly, streptomycin treatment did not 
increase the availability of these sugars in Nos2-deficient mice 
(Fig. 2b, c). These data further supported the idea that generation of 
post-antibiotic galactarate and glucarate required an intact Nos2 gene. 

To investigate whether the fitness advantage conferred by galactarate/ 
glucarate utilization required iNOS activity, streptomycin-treated mice 
(C57BL/6) received drinking water supplemented with aminoguanidine 
hydrochloride, a specific iNOS inhibitor!’, and were infected with a 
1:1 mixture of the wild-type S. Typhimurium and a gudT-STM2959 
mutant. Inhibition of iNOS activity with AG significantly (P < 0.05) 
blunted the fitness advantage conferred by the galactarate/glucarate 
utilization genes (Fig. 2a), suggesting that iNOS activity was required 
for generating galactarate and glucarate in the large intestine. 

The host enzyme iNOS uses L-arginine to produce nitric oxide (NO), 
a reactive nitrogen species'*. Reactive nitrogen species are known cata- 
lysts in the oxidation of alcohols and aldehydes’». We thus hypothesized 
that by generating reactive nitrogen species, streptomycin-induced 
iNOS synthesis might drive an oxidation of monosaccharides, thereby 
yielding the oxidation products galactarate and glucarate. To investigate 
whether reactive nitrogen species might oxidize galactose and glucose 
to galactarate and glucarate, respectively, we used 2,2,6,6-tetramethyl 
piperidine-1-oxyl (TEMPO), which is a stable free nitrosyl radical that 
mimics the activity of reactive nitrogen species (reviewed in ref. 16). 
Galactose and glucose were incubated in the presence of TEMPO and 
a co-oxidant (NaOC]) or the co-oxidant alone. Detection by GC/MS 
indicated that TEMPO oxidized galactose to galactarate (Fig. 2d) and 
glucose to glucarate (Fig. 2e). Next, we investigated whether the mon- 
osaccharides galactose and glucose were present in caecal contents. 
Galactose and glucose were detected in caecal contents of conventional 
(C57BL/6) mice, Nos2-deficient mice and germ-free mice (Fig. 2f, g). 
Collectively, these data suggested that monosaccharides were present in 
the murine caecum and could be oxidized by reactive nitrogen species 
to yield galactarate and glucarate. 

Gene clusters for the utilization of galactarate and glucarate are 
also present in Escherichia coli and other related Enterobacteriaceae 
(Extended Data Fig. 1c). As treatment with streptomycin leads to an 
uncontrolled expansion of E. coli in the murine intestine!’, we investi- 
gated whether the underlying mechanism also involved utilization of 
galactarate and glucarate. To test this, we deleted the gudDXP and garD 
genes in the human E. coli isolate Nissle 1917 (Extended Data Fig. 1c). 
Deletion of the gudDXP garD genes rendered E. coli unable to grow 
with galactarate or glucarate as the sole carbon source, but did not affect 
its ability to utilize glycerate (Extended Data Fig. 5a). The gudDXP garD 
genes conferred a fitness advantage during growth of E. coli in the colon 
of streptomycin pre-treated mice, which was significantly (P < 0.05) 
diminished after treatment with the iNOS inhibitor aminoguanidine 
hydrochloride (Extended Data Fig. 5b). 

Here we show that by inducing the production of host-derived 
reactive nitrogen species, streptomycin treatment generates galactarate 
and glucarate in the gut lumen, thereby providing S. Typhimurium 
and E. coli with a considerable fitness advantage. Increases in galac- 
tarate and glucarate levels are also observed after treatment of mice 
with cefoperazone or a cocktail of vancomycin and bacitracin'*””. 
A post-antibiotic expansion of Enterobacteriaceae is of concern due to 
the recent emergence of carbapenem antibiotic resistance within this 
group. Exposure of patients in intensive care units to broad-spectrum 
antibiotics is a known risk factor for acquiring an infection with 
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carbapenem-resistant E. coli and Klebsiella isolates”’. Our findings identify 
host-mediated sugar oxidation as a new mechanism contributing to 
post-antibiotic expansion of Enterobacteriaceae. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Bacterial strains and growth conditions. S. Typhimurium and E. coli strains used 
in this study are listed in Extended Data Table 3. All cultures were routinely grown 
aerobically at 37°C in either Luria-Bertani (LB) broth (10g per litre tryptone, 
5g per litre yeast extract, 10 g per litre NaCl) or on LB agar plates (1.5% Difco agar) 
unless indicated otherwise. When necessary, antibiotics were added to the medium 
at the following concentrations: nalidixic acid (Nal) 50 mg per litre, kanamycin 
(Km) 100 mg per litre, chloramphenicol (Cm) 30 mg per litre, carbenicillin (Carb) 
100 mg per litre. 

Sugar fermentation assay. 5 ml of fermentation broth (peptone, 10g per litre; 
bromothymol blue, 0.024 g per litre; final pH 7.4+ 0.1) supplemented with the 
indicated carbon source (galactarate, glucarate, glucose, galactose, mannose or 
rhamnose, 10g per litre each) or the control broth (no sugar added) were inocu- 
lated with 101] of an overnight culture of each indicated S. Typhimurium strain 
and incubated statically at 37 °C for 24h. Fermentation of the sugar in the broth is 
indicated by a colour change from blue to yellow. 

Anaerobic growth assays. 10 ml of M9 minimal medium (75 g per litre 
NazHPO4x2H,0, 30g per litre KH2POu, 5g per litre NaCl, 10g per litre NH4Cl, 
0.1mM CaCh, 1mM MgSOg, 0.001% thiamine) supplemented with hog mucin 
(0.1% w/v) or galactarate (0.04% w/v when added as sole carbon source, 0.1% 
w/v and 0.01% w/v when added to mucin broth) were inoculated with 2 x 10° 
colony-forming units (CFU) of each strain and incubated anaerobically at 37°C 
for 24h inside an anaerobic chamber (Bactron I Anerobic Chamber; Sheldon 
Manufacturing, Cornelius). Bacterial numbers were determined by plating serial 
tenfold dilutions onto LB agar containing the appropriate antibiotics. The ratios of 
recovered wild-type and mutant bacteria after 24h were normalized to the ratio at 
Oh to calculate the competitive index. 

Construction of plasmids. Standard cloning techniques were used to generate the 
plasmids used in this study. All plasmids and primers used in this study are listed 
in Extended Data Tables 4 and 5. PCR products were confirmed by sequencing 
(SeqWright Fisher Scientific, Houston). Suicide plasmids were propagated in 
E. coli DH5a pir. Plasmid pFF35 was constructed by PCR amplifying a 5’ flanking 
fragment of gudT using primers 71 and 72 and a 3’ flanking fragment STM2959 
using primers 73 and 74. The two PCR fragments were gel purified, digested with 
Xbal and ligated with T4 DNA ligase (NEB). The ligation mix served as a template 
for a PCR with primers 71 and 74 and the product was gel purified and cloned into 
pCR2.1 using the TOPO TA cloning kit (Invitrogen). The plasmid construct was 
confirmed by sequencing and designated pFF35. To generate a suicide plasmid for 
replacing the gudT ygcY gudD STM2959 genes with a kanamycin (KSAC) cassette, 
the insert of plasmid pFF35 was excised using BamHI and ligated into the BamHI 
site of the suicide plasmid pRDH10. The resulting plasmid was digested with XbaI 
and ligated with a KSAC cassette generated from pBS34 by digestion with Xbal. 
The resulting suicide plasmid was designated pFF57. 

To construct plasmids pFF62 and pFF63, respectively, chromosomal regions 
upstream and downstream of gudD and garD in IR715 were amplified by PCR 
and cloned into BamHI-digested pRDH10 using Gibson Assembly Master Mix 
(NEB). To construct plasmids pFF64 and pFF65, respectively, chromosomal 
regions upstream and downstream of the gudDXP operon and of garD from 
E. coli Nissle 1917 were amplified by PCR and cloned into BamHI digested 
pRDH10 using Gibson Assembly Master Mix (NEB). 

For complementation of the gudT-STM2959 mutant, the gudT ygcY gudD 
STM2959 operon including its promoter region was PCR amplified using primers 
92 and 93 or 94 and 95. The two PCR fragments were gel purified and cloned 
into BamHI digested pWSK29 using Gibson Assembly Master Mix (NEB). The 
complementation plasmid was verified by sequencing and designated pGUDT. 
Construction of mutants in S. Typhimurium. All suicide plasmids were intro- 
duced into S. Typhimurium IR715 recipient strains by conjugation using E. coli 
$17-1Apir as the donor strain. Exconjugants were selected on LB + Nal+ Cm to 
select for clones that had integrated the suicide plasmid. Sucrose counter-selection 
was performed as published previously". Strains that were sucrose resistant and 
Cm were verified by PCR. 

Plasmid pFF57 was introduced into FF7 and SPN487 to generate FF162 
(AgudT-STM2959::Km®) and FF217 (AinvA AspiB AgudT-STM2959::Km®), 
respectively. Plasmid pFF62 was introduced into AJB715 to generate FF464 
(phoN::Km® AinvA AspiB gudD). Plasmid pFF63 was introduced into AJB715 to 
generate strain FF461 (phoN::Km® AinvA AspiB garD). 

Construction of mutants in E. coli Nissle 1917. Suicide plasmids were intro- 
duced into E. coli Nissle 1917(pSW172) recipient strains by conjugation using 
E. coli S17-1)pir as the donor strain. To ensure stable propagation of pSW172, 
all steps of the conjugation were performed at 30°C. Exconjugants were selected 
on LB + Carb + Cm to select for clones that had integrated the suicide plasmid. 
Subsequent sucrose counter-selection was performed as published previously”!. 
Strains that were sucrose resistant and Cm° were verified by PCR. If appropriate, 


pSW172 was cured by growing the resulting mutant strains at 37°C. Plasmids 
pFF64 and pFF65 were successively introduced into E. coli Nissle 1917(pSW172) 
to generate FF441 (E. coli Nissle 1917 gudDXP garD). 

Generalized phage transduction. Phage P22 HT105/1 int-201 was used for 
generalized transduction. A phage lysate of strain CS019 was used to transduce 
IR715 wild-type and the AinvA AspiB mutant (SPN487) to Cm® generating 
FF176 (IR715 phoN::Tn10d-Cam) and FF183 (IR715 phoN::Tn10d-Cam AinvA 
AspiB), respectively. Transductants were cleaned from phage contaminations 
on Evans blue-Uranine (EBU) agar plates and tested for phage sensitivity 
by cross-streaking against P22 H5. The strains were verified by PCR to be 
phoN::Tn10d-Cam. 

Animal experiments. No statistical methods were used to predetermine sample 
size. The experiments were not randomized. The investigators were blinded to 
allocation of mice for assessment of histopathology and readouts of inflammation. 
All animal experiments were approved by the Institutional Animal Care and Use 
Committees at the University of California, Davis. Female C57BL/6) wild-type and 
Nos2-deficient mice (B6.129P2-Nos2'"!!"“/J) aged 9-12 weeks were obtained from 
The Jackson Laboratory (Bar Harbour). Mice were pre-treated with an oral dose of 
20 mg streptomycin or sterile water 24h before infection with S. Typhimurium or 
E. coli Nissle 1917, respectively. For single infections with Salmonella, mice were 
inoculated intra-gastrically with 2 x 10° CFU of the indicated S. Typhimurium 
strains. For competitive infections, mice were inoculated with a 1:1 mixture of 
each indicated strain. For infection with E. coli Nissle 1917, mice were infected with 
2 x 10° CFU ofa 1:1 mixture of the indicated strains. In some experiments, drinking 
water was supplemented with aminoguanidine hydrochloride (1 mg ml~!). Mice 
were euthanized 4 days after infection. The colon contents were collected for 
enumeration of bacterial numbers, the distal caecum was collected for histopathology 
scoring. Bacterial numbers were determined by plating serial tenfold dilutions onto 
LB agar containing the appropriate antibiotics. The ratio of recovered wild-type and 
mutant bacteria in colon contents were normalized by the ratio in the inoculum 
to calculate the competitive index. For measurements of sugar concentrations in 
caecal contents, mice received an oral dose of either 20mg streptomycin or sterile 
water. Mice were euthanized 4 days after streptomycin treatment and caecal 
contents were snap frozen in liquid nitrogen and stored at —80°C. The distal caecal 
tissue was collected for RNA isolation. 

Germ-free Swiss Webster mice were obtained from Taconic Farms. The mice 
were bred and housed under germ-free conditions inside gnotobiotic isolators 
(Park Bioservices, LLC). Weekly 16 S PCR and cultures were performed to 
evaluate the germ-free status of the mice. For experiments, male and female 
6-8-week-old mice were transferred to a biosafety cabinet and maintained in 
sterile cages for the duration of the experiment. Caecal contents were snap frozen 
in liquid nitrogen and stored at —80°C until further processing for metabolite 
measurements. 

Quantitative real-time RT-PCR analysis. Murine caecal tissue was collected, snap 
frozen in liquid nitrogen and stored at —80°C. Expression analysis was performed 
as described previously”. Briefly, RNA was extracted using TRI reagent (Molecular 
Research Center, Cincinnati) according to the manufacturer’s protocol. Isolated 
RNA was DNase-treated (Applied Biosystems) and cDNA was synthesized from 
lpg of RNA using TaqMan reverse transcription reagents (Applied Biosystems). 
Real-time PCR was performed using SYBR-Green (Applied Biosystems) and the 
primers listed in Extended Data Table 5. The changes in mRNA levels of target 
genes were calculated using the comparative Ct method (Applied Biosystems) and 
normalized to the levels of B2m mRNA. 

Histopathology. The distal caecal tissue was fixed in phosphate-buffered formalin 
and 5 .m sections of the tissue were stained with haematoxylin and eosin. Blinded 
scoring of tissue sections was performed by a veterinary pathologist based on the 
criteria listed in Extended Data Tables 1 and 2. Images were taken with a Zeiss 
Primo Star microscope with a 10x objective. 

TEMPO-mediated oxidation of sugars. The oxidation reactions were 
carried out at room temperature in an open beaker fitted with a pH-meter 
electrode and a thermometer to monitor the reactions. The pH was kept constant 
at 8 by titration with a 0.5 M NaOH solution. p-galactose (2 g) and TEMPO 
(16 mg, 2,2,6,6-tetramethyl-1-piperidinyloxy, free radical, Sigma-Aldrich) were 
dissolved in 250 ml ultrapure water. As a control some of the oxidations were 
done without adding TEMPO to the reaction mixture. To start the oxidation, 
sodium hypochlorite solution (a total of 11 ml of 10-15% NaClO, Sigma- 
Aldrich) was added at a rate not exceeding a pH of 8.0. After adding the entire 
NaClO solution the pH was kept at 8.0 until no further pH shift was detectable. 
The reaction mixture was precipitated with 3 volumes of > 95% ethanol at 4°C 
for 48 h, filtered (Whatman qualitative filter paper, grade 602 h), washed with 
acetone and dried at 50°C for 10-15h. The resulting powdered mixture was 
used as a carbon source for in vitro growth assays and analysed by GC/MS for 
the presence of galactarate. 
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Measurement of sugar concentrations by GC/MS. Measurements were done 
at the West Coast Metabolomics Center at UC Davis as previously described”’. 
20mg of each sample, with p-Glucose-C-d7 added as the internal standard, were 
extracted with 1 ml ofa pre-chilled acetonitrile:isopropanol:water (3:3:2) mixture. 
4501] aliquots of the supernatants were evaporated to dryness and subjected to 
a two-step derivatization using methoximation and trimethylsilylation. GC/MS 
analysis was performed using an Agilent 7890 Gas Chromatography system 
coupled to an Agilent 5977A Mass spectrometer. An Rtx-5Sil MS w/Integra-Guard 
column (30m x 250,1m i.d., Restek), chemically bonded with a 1,4-bis(dimethyl- 
siloxy)phenylene-dimethy] polysiloxane cross-linked stationary phase (0.25 1m 
film thickness) was used to separate the derivatives. Helium was used as a 
carrier gas at a constant flow rate of 1.2ml min~!. The GC oven temperature was 
programmed to increase from 50°C to 325°C at a rate of 10°C min !. The temper- 
atures of the injector, transfer line, electron impact (EI) ion source, and quadrupole 
were set to 250°C, 290°C, 230°C and 150°C, respectively. The mass spectrometer 
was set to scan at a sampling rate of 4 and data was collected in a full scan mode 
(m/z 50 to 600). For quantification of sugars in the samples, a 6 point calibration 
curve was prepared with p-Glucose-C-d7 as internal standard. Agilent Mass 
Hunter quant software was used for data analysis. 

Statistical analysis. The fold-changes of ratios for bacterial numbers and mRNA 
levels, respectively, and values for sugar concentrations were logarithmically trans- 
formed for statistical analysis. An unpaired Student's t-test was used to determine 
whether differences between groups were statistically significant (P < 0.05). Error 
bars indicate standard error of the mean (s.e.m). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Galactarate/glucarate fermentation by 

S. enterica. a, One of the biochemical reactions used in the Salmonella 
serotyping scheme by Kauffman and White is the ability to ferment 
galactarate®. We divided 1,367 S. enterica subspecies enterica serovars into 
two groups: those associated with extraintestinal disease (serovars Typhi, 
Paratyphi A, Paratyphi B, Paratyphi C, Sendai, Choleraesuis, Typhisuis, 
Dublin, Bovismorbificans, Abortusovis, Abortusequi, Gallinarum biovar 
Gallinarum and Galliunarum biovar Pullorum) and those associated 

with human gastroenteritis (the remaining 1,354 serovars). The bar graph 
shows the percentages of serovars in each group that are positive, negative, 
delayed or differing (some isolates within the serovar are positive while 
others are negative) for this reaction. b, Detection of galactarate in 

chow for conventional or germ-free mice using GC/MS (n= 4). 
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c, Schematic drawing of the two gene clusters encoding proteins involved 
in the degradation of glucarate and galactarate in S. Typhimurium 
(ATCC14028), E. coli (Nissle 1917) and Klebsiella oxytoca (KCTC1686). 
Arrows indicate genes. The bracket indicates the DNA region deleted in 
the indicated mutants. d, Minimal medium or mucin broth supplemented 
with the indicated carbon sources (0.1% w/v) was inoculated with a 

1:1 mixture of the wild-type S. Typhimurium and indicated mutants. 
Competitive index (CI) recovered after 24 h incubation in an anaerobic 
chamber. e, Streptomycin-treated C57BL6 mice (n = 6) were infected with 
a 1:1 mixture of the indicated S. Typhimurium strains and the competitive 
index in colon contents determined 4 days after infection. Bars represent 
geometric means + standard errors (d, e). A Student's t-test was applied to 
determine statistical significance. 
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Extended Data Figure 2 | Evaluation of caecal inflammation in are listed in Extended Data Table 4. Each bar represents data from an 
streptomycin-treated mice 4 days after S. Typhimurium infection. individual animal. b, Representative images of haematoxylin and eosin 
a, Streptomycin pre-treated mice were infected with the indicated (H&E)-stained caecal sections scored in a, along with an image from a 
strain mixtures and caecal histopathology was scored four days later mock-infected mouse for comparison. All images were taken at the same 
for four mice per group. The criteria used for histopathology scoring magnification. m, mucosa; s, submucosa; ml, muscle layer; lu, lumen. 
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Extended Data Figure 3 | Detection of galactaric acid and glucaric acid by GC/MS. a, Representative GC elution profile of a caecal sample containing 
galactaric acid and glucaric acid (arrows). b, Representative single ion monitoring scan spectrum of galactaric acid and glucaric acid. 
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Extended Data Figure 4 | Elevated Nos2 expression leads to nitrosyl 
radical-mediated oxidation of galactose. a, Expression levels of Nos2 
mRNA in RNA isolated from the caecal tip three days after mock- 
treatment (mock) or treatment of mice with streptomycin (Strep) was 
determined by quantitative real-time PCR. Bars represent geometric 
means + standard errors. A Student's t-test was applied to determine 
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statistical significance. b, Schematic of the oxidation of galactose to 
galactarate by TEMPO. 2,2,6,6-tetramethyl piperidine-1-oxyl (TEMPO) is 
a stable free nitrosyl radical that can oxidize terminal alcohol and aldehyde 
groups to carboxyl groups!*. Consumption of TEMPO during the 

redox reaction is prevented by addition of a co-oxidant (NaOCl), which 
regenerates the nitrosyl radical. 
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Extended Data Figure 5 | Galactarate/glucarate fermentation by E. coli. with a 1:1 mixture of the indicated E. coli strains and received the iNOS 
a, Minimal medium or mucin broth supplemented with the indicated inhibitor aminoguanidine (AG) or vehicle control. The competitive index 
carbon sources was inoculated with a 1:1 mixture of the E. coli wild type in colon contents was determined four days after infection. Bars represent 


(wt) and a garDXP garD mutant. CI, competitive index recovered after 24h geometric means + standard errors (a, b). A Student’s t-test was applied to 
incubation in an anaerobic chamber. Growth was verified with 3 biological determine statistical significance. 
replicates. b, Streptomycin-treated C57BL6 mice (n =6) were infected 
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Extended Data Table 1 | Chart indicating scoring criteria for blinded examination of H&E-stained sections from the caecum 


Submucosal | Epithelial Exudate PMN Mononuclear 
edema damage all infiltration” cell infiltrate* 
No changes | No changes No changes | No changes | No changes 
Detectable Desquamation | Slight 6-20 5-10 

(<10%) accumulation ey 


Mild Mild erosion Mild 21-60 10-20 
(10%-20%) accumulation Pp 
Moderate Marked Moderate 60-100 20-40 
(20%-40%) | erosion accumulation ee 
Marked Ulceration Marked >100 >40 

(>40%) accumulation ee 


*Number of cells per high-magnification microscopic field. 
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Extended Data Table 2 | Blinded histopathology scoring scheme 


Combined score Description 
Severe inflammation 


Mild inflammation 


I anc 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 3 | Bacterial strains used in this study 


E. coli 


TOP10 F- merA A(mrr-hsdRMS-mcrBC) ®80/acZAM15 facx74 Invitrogen 
recA? araD139 A(ara - leu)7697 galU galK rpsL endA1 
nupG 
DH5a Apir F- endA? hsdR17 (r-m+) supE44 thi-1 recA gyrA relA1 Lab stock 
mel A(lacZYA-argF)U189 D80/acZAM15 Apir a 
Zxx::RP4 2-(Tet™::Mu) (Km"™::Tn7) Apir 
ey 


FF441 Nissle 1917 gudDXP garD This Study 


Salmonella 


ATCC 14028 | Wild-type isolate of S. enterica serovar Typhimurium 
IR715 Nalidixic acid-resistant derivative of ATCC14028 


SPN487 IR715 AinvA AspiB 
FFI76 IR715 phoN::Tn10d-Cam This Study 


soni ia 


References 24-28 are cited in this table. 


CS019 ATCC 14028 phoN::Tn10d-Cam a 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 
Extended Data Table 4 | Plasmids used in this study 


pFF35 5’ and 3’ flanking regions of gudTygcYgudDSTM2959 | This Study 


operon in pCR2.1, Carb®, Km® 
pFF57/ KSAC cassette flanked by up-/downstream regions of the | This Study 


gudTygc YgudDS TM2959 operon in pRDH10; Cm®, Km® 


pFF64 Up-/downstream regions of the gudDXP operon from EcN in | This Study 


pRDH10 


pFF65 Up-/downstream regions of garD from EcN in pRDH10 This Study 


pGUDT gudTygcYgudDSTM2959 operon under the control of its | This Study 


native promoter in pWSK29; Carb® 


References 29 and 30 are cited in this table. 
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Extended Data Table 5 | Primers used in this study 


71 GGATCCTCTGAACCGCTGCTAATGG-S 
72 TCTAGAGTTACGCTGAGTTGTAGG-3 
73 TCTAGAGTAGGGAATCAGAGATAAGG-S 

4 GGATCCAGGGAGATACGCATAATGG-3 

116 

17 

18 

719 

72 

113 

14 

15 
22 
: 

5’. a 


3 
94 CTTGCATGGTGCGTTAAGTC-3’ 
95 3’-CGCTCTAGAACTAGTGATCCGGCCTACAACTCAGC-3' 


Deletion of gudDXP operon in E. coli Nissle 1917 

136 CACACCCGTCCTGTGCTGTGTTTATGCCGGATG 
137 CCGGTTCGTTCCCTGGCGATGTTTAC 

138 CCAGGGAACGAACCGGCAATAGAAAGC 

139 GCGTCCGGCGTAGAGTTTGCCTGGAGTCAAGCG 
Deletion of garD in E. coli Nissle 1917 

235 CACACCCGTCCTGTGTGGCCAACATCAAAATCAG 
236 CACCGGTGGTTCGGGTATTTCGGTAG 

237 ACCCGAACCACCGGTGACCTGATTTC 

238 GCGTCCGGCGTAGAGGCCAGCGACAAGTTTCTTTC 
Quantitative real-time RT-PCR 


Organism [Target [Sequence SS 
Mus musculus|B2M __ | 5'-GGTCTTTCTGGTGCTTGTCTCA-3 
hala lun 5'-GTICGGCTICCCATICTCC-3" 

Mus musculus |Nos2 | 5'-TTGGGTCTTGTTCACTCCACGG-3' 
isiiiieaeaia (eacaall 5'-CCTCTTTCAGGTCACTTTGGTAGG-3' 


*Restriction enzyme sites are underlined, overlapping sequences for Gibson Assembly are in bold. 
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Coordinating cardiomyocyte interactions to direct 
ventricular chamber morphogenesis 


Peidong Han!, Joshua Bloomekatz!, Jie Ren!, Ruilin Zhang!, Jonathan D. Grinstein!, Long Zhao’, C. Geoffrey Burns’, 


Caroline E. Burns’, Ryan M. Anderson? & Neil C. Chi’ 


Many organs are composed of complex tissue walls that are 
structurally organized to optimize organ function. In particular, 
the ventricular myocardial wall of the heart comprises an outer 
compact layer that concentrically encircles the ridge-like inner 
trabecular layer. Although disruption in the morphogenesis of 
this myocardial wall can lead to various forms of congenital heart 
disease! and non-compaction cardiomyopathies’, it remains unclear 
how embryonic cardiomyocytes assemble to form ventricular wall 
layers of appropriate spatial dimensions and myocardial mass. Here 
we use advanced genetic and imaging tools in zebrafish to reveal 
an interplay between myocardial Notch and Erbb2 signalling that 
directs the spatial allocation of myocardial cells to their proper 
morphological positions in the ventricular wall. Although previous 
studies have shown that endocardial Notch signalling non-cell- 
autonomously promotes myocardial trabeculation through Erbb2 
and bone morphogenetic protein (BMP) signalling’, we discover 
that distinct ventricular cardiomyocyte clusters exhibit myocardial 
Notch activity that cell-autonomously inhibits Erbb2 signalling and 
prevents cardiomyocyte sprouting and trabeculation. Myocardial- 
specific Notch inactivation leads to ventricles of reduced size and 
increased wall thickness because of excessive trabeculae, whereas 
widespread myocardial Notch activity results in ventricles of 
increased size with a single-cell-thick wall but no trabeculae. 
Notably, this myocardial Notch signalling is activated non-cell- 
autonomously by neighbouring Erbb2-activated cardiomyocytes 
that sprout and form nascent trabeculae. Thus, these findings 
support an interactive cellular feedback process that guides 
the assembly of cardiomyocytes to morphologically create the 
ventricular myocardial wall and more broadly provide insight 
into the cellular dynamics of how diverse cell lineages organize to 
create form. 

The embryonic zebrafish heart comprises 200-300 cardiomyocytes 
when cardiac chambers form‘, and thus provides an opportunity to 
interrogate in detail how individual cardiomyocytes organize to create 
the nascent structures of the vertebrate embryonic ventricular wall. As 
a result, previous zebrafish studies have shown that distinct cardiomy- 
ocytes extend from the embryonic ventricular wall into the lumen to 
develop cardiac trabeculae®, whereas others remain within this outer 
wall to create the primordial layer*. Yet, how these cardiomyocytes are 
selected to form the distinct myocardial layers of the ventricular wall 
remains to be fully elucidated. 

Because of the role of Notch signalling in regulating cell-cell 
interactions®’, we examined its dynamic activation during zebraf- 
ish embryonic ventricular morphogenesis using the Tg(Tp1:d2GFP) 
Notch reporter line, which expresses a destabilized green fluorescent 
protein upon Notch activation® (Fig. 1 and Extended Data Fig. 1). As 
previously reported’, we observed Notch signalling first in the ven- 
tricular endocardium at 24 hours post-fertilization (hpf), which then 


becomes restricted to the atrioventricular (AV) and outflow tract (OFT) 
endocardium at 48 hpf (Extended Data Fig. la-l). From 72 to 96 hpf 
when cardiac trabeculation initiates>!°",, a subset of ventricular cardi- 
omyocytes begins to express Notch-activated Tp1:d2GFP and remains 
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Figure 1 | Notch signalling is dynamically activated in distinct 
myocardial clusters during cardiac morphogenesis. Cardiac ventricles 
at 72 hpf, 96 hpf, and 14 dpf expressing (a-k, m-o) Tp1:d2GFP; 
myl7:mCherry or (q-s) Tp1:d2GFP; myl7:H2A-mCherry. a-k, Confocal 
slices; m-o, q-s, three-dimensional reconstructions. b, c—d, f-h, j-k, 
Magnifications of boxed areas in a, b, e, and i, respectively. Images 

c and d, g and h, andj and kare single channels from b, f, and i merged 
images, respectively. 1, Schematic of myocardial Notch signalling. 

p, t, Quantification of (p) myocardial Tp1:d2GFP* clusters and 

(t) cardiomyocytes per Tp1:d2GFP* cluster. n, Number of embryos 
analysed per stage. White arrows, Tp1:d2GFP* cardiomyocytes; white 
arrowheads, trabeculating cardiomyocytes; yellow arrows, cardiomyocytes 
in Tp1:d2GFP* clusters. White and yellow asterisks, AV and OFT. 

Mean + s.e.m. Scale bar, 25 um. 
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Figure 2 | Myocardial Notch signalling cell-autonomously regulates 
cardiomyocyte segregation between ventricular wall layers. Inhibiting 
Notch signalling by (b, j) DAPT treatment, (d, 1) global dnaMAML-GFP 
(hsp70I:dnM), or (f, n) myocardial-specific d(nMAML expression 
(myl7:Cre; ubi:RSdnM) leads to excessive trabeculation at 72 hpf, whereas 
(h, p) myocardial-specific constitutive Notch activation via NICD 
expression (myl7:Cre; hsp701:RSN) diminishes trabeculation at 120 hpf. 
a, c, e, g, i, k, m, 0, Respective controls for each condition. a—p, For 
quantification see Extended Data Fig. 5. q, Myocardial priZm (brainbow) 
clonal studies. r, s, The 72 hpf myl7:CreER; priZm myocardial clones 
treated with DMSO or DAPT at 60 hpf. t, Although DMSO- and DAPT- 
treated ventricles display a similar overall number of myocardial clones 
(blue) (n= 10 and 11 embryos), DAPT-treated ventricles exhibit more 


in the ventricular outer wall (Fig. 1a—h, arrows, and Extended Data 
Fig. lm-o, arrows), whereas ventricular cardiomyocytes extending to 
form cardiac trabeculae fail to express Tp1:d2GFP (Fig. la-h arrow- 
heads). These Tp1:d2GFP* cardiomyocytes are frequently adjacent to 
sprouting Tp1:d2GFP~ cardiomyocytes (Fig. la~h) and form clusters 
of two or three cardiomyocytes across the surface of the ventricular 
myocardial wall (Fig. 1m-t), which quantitatively correlate with the 
number of emerging cardiac trabeculae (Extended Data Fig. 1p-r). 
However, after the heart has established cardiac trabeculae, these 
Tp1:d2GFP* myocardial clusters progressively decrease and are no 
longer observed by 14 days post-fertilization (dpf), despite the pres- 
ence of Tp 1:d2GFP* AV and OFT endocardial cells (Fig. 1i-l, 0, p, s, t 
and Extended Data Fig. 1s, t). 

Using the Tg(Tp1:eGFP) Notch reporter line!*, which expresses a 
more stable fluorescent protein than Tg(Tp1:d2GFP) (Extended Data 
Fig. 2a—h), we observed that Tp1:eGFP* cardiomyocytes remain 
present in the ventricular outer wall until 30-45 dpf because of the 
eGFP perdurance, and then become the ventricular primordial layer 
by 60-90 dpf when the ventricular cortical layer of the adult zebrafish 
heart forms‘ (Extended Data Fig. 3a-n). Unlike ventricular trabecular 
and cortical cardiomyocytes, these Tp1:eGFP~ ventricular primor- 
dial cardiomyocytes fail to display organized sarcomeres by c-actinin 
immunostaining, are surrounded by extensive wheat germ agglutinin 
stained extracellular matrix and exhibit a thin cellular morphology as 
previously reported* (Extended Data Fig. 30-t). Overall, these data 
suggest that early myocardial Notch signalling may determine which 
ventricular cardiomyocytes remain in the embryonic ventricular outer 
wall to subsequently become the distinctive ventricular primordial layer 
of the adult zebrafish heart. 
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clones in trabeculae (red) and fewer in the outer ventricular wall (green), 
compared with control. Crosses, mean and s.e.m. *P < 0.05, by Student’s 
t-test. NS, not significant. u, Notch-altering mosaic cardiomyocyte 
studies. w, Constitutively activated Notch cardiomyocytes expressing 
NICD-P2A-Emerald are primarily located on the ventricular outer 

wall (n = 13/14 clones, Fisher's exact test, P < 0.05); whereas (y) Notch- 
inhibited cardiomyocytes expressing dnSuH-P2A-Emerald are mainly 
found in trabeculae (n = 15/18 clones, Fisher’s exact test, P< 0.05). v, x, In 
controls lacking Tg(myl7:cre), mCherry* cardiomyocytes are distributed 
equally between both layers (n= 11/21 and 14/26 clones in the outer wall). 
z, Quantitative analysis of v—y. Insets are magnifications of boxed areas. 
Arrowheads and arrows, trabeculae and outer wall cardiomyocytes. HS, 
heat shock. Scale bar, 25 um. 


We next investigated the role of Notch signalling in the endocardium 
and myocardium during ventricular morphogenesis through selectively 
perturbing Notch signalling at specific cardiac developmental stages. 
Treating zebrafish embryos with DAPT, which effectively decreases 
Tp1:d2GFP Notch reporter expression and inhibits Notch signalling 
(Extended Data Fig. 2), from 20 to 48 hpf when Notch signalling is 
activated in the endocardium, reduces cardiac trabeculation (Extended 
Data Fig. 2q-s) as previously described’; however, treating from 60 to 
72 hpf, when Notch signalling is present in the ventricular myocardium, 
results in increased trabeculae formation (Fig. 2a, b, i, j). Consistent 
with these results, BMP signalling, which is activated in ventricular 
trabeculae’, is also increased in similarly DAPT-treated zebrafish 
embryos from 60 to 72 hpf (Extended Data Fig. 4a—f). Furthermore, 
heat-shocking Tg(hsp70l:dnMAML-GFP)'? (abbreviated 
as hsp70I:dnM) embryos from 60 to 72 hpf, which induces dominant 
negative Mastermind-like (dnMAML) expression to block down- 
stream Notch signalling, results in similar excessive trabeculation 
(Fig. 2c, d, k, 1). 

To explore whether Notch signalling functions in a cardiomyocyte- 
specific manner to directly guide myocardial cell fate position within 
the ventricle, we employed a myocardial-specific Cre (Extended Data 
Fig. 5a—d) strategy in combination with Tg(ubi:loxp-mKate2-STOP- 
loxp-dnMAML-GFP) or Tg(hsp70l:loxp-mCherry-STOP-loxp- 
NICD-P2A-Emerald)'“ ‘switch lines’ (abbreviated as ubi:RSdnM 
and hsp70I:RSN) to inhibit or activate Notch signalling in cardi- 
omyocytes, respectively. As observed in DAPT-treated and heat- 
shocked Tg(hsp70l:dnM) zebrafish from 60 to 72 hpf, Tg(myl7:Cre; 
ubi:RSdnM) zebrafish display excessive cardiac trabeculation due to 
inhibition of myocardial Notch signalling (Fig. 2e, f, m, n). Conversely, 
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Figure 3 | Myocardial Erbb2 signalling non-cell-autonomously activates 
Notch signalling in neighbouring cardiomyocytes. Compared with 

(a) controls (n= 0/15 embryos), Tp1:d2GFP; myl7:mCherry (b) erbb2 

MO (n= 10/12) and (d) erbb2~/~ mutants (n = 10/10) display reduced 
trabeculae and myocardial Notch signalling. c, e, DAPT treatment at 

60 hpf cannot rescue these myocardial defects, but can diminish AV 

and OFT endocardial Notch signalling (asterisks) (n = 15/17, 17/17 
embryos, respectively). f-i, Blastomere transplantation studies. Compared 


heat-shocking Tg(myl7:Cre; hsp701:RSN) zebrafish, which induces 
myocardial Notch-intracellular domain (NICD) expression, between 
60 and 120 hpf leads to cardiac ventricles without significant trabeculae 
because of constitutively activated Notch signalling throughout the 
myocardium (Fig. 2g, h, 0, p). Moreover, constitutive myocardial Notch 
activation at later time points (80, 96, and 120 hpf to 7 dpf) prevents 
trabeculae from further sprouting and/or extending (Extended Data 
Fig. 6a—g); however, trabeculae continue to develop after the cessation 
of this myocardial Notch activity, but fail to recover to wild-type levels 
(Extended Data Fig. 6h, i). 

In line with these findings, we discovered that Notch inhibition 
results in smaller ventricular areas (Fig. 2a-f and Extended Data Fig. 5e) 
and thicker ventricular myocardial walls (Fig. 2a-f and Extended 
Data Fig. 5f) due to increased cardiomyocytes within the trabecular 
layer (approximately two or three cells thick) (Fig. 2i-n, Extended 
Data Fig. 5g), whereas Notch activation gives rise to larger ventricu- 
lar areas (Fig. 2g, h, Extended Data Fig. 5e) and thinner ventricular 
myocardial walls (Fig. 2g, h and Extended Data Fig. 5f) that are about 
one cell thick with no apparent trabecular cardiomyocytes (Fig. 20, p 
and Extended Data Fig. 5g). Although these hearts do not exhibit a 
significant difference in overall cardiomyocyte numbers compared 
with control hearts (Extended Data Fig. 5h-p), we did discover that 
Notch inhibition promotes the redistribution of N-cadherin away from 
intercellular contacts whereas Notch activation prevents this reorgan- 
ization (Extended Data Fig. 7), suggesting that myocardial Notch 
signalling may control ventricular size and wall thickness through 
regulating the allocation of cardiomyocytes between the ventricular 
wall layers via cell-cell contacts. To further investigate this possibility, 
we monitored the fate of individual genetically labelled cardiomyocytes 
using a myocardial specific Brainbow system Tg(myl7:CreER; priZm)* 
(Fig. 2q-t). After confirming that adjacent cardiomyocytes were con- 
sistently labelled with different colours at 60 hpf before trabeculation 
(Extended Data Fig. 8), we treated zebrafish embryos with DAPT or 
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with (g) control MO hosts (n= 12 embryos), (i) a greater percentage 
of donor Tg(myl7:Cerulean) wild-type cardiomyocytes are located in 
the trabeculae of erbb2 MO Tg(Tp1:d2GFP; myl7:H2A-mCherry) hosts 
(n= 10). In contrast to (h) non-transplanted erbb2 MO hearts (n= 16), 
(i) transplanted donor Tg(myl7:Cerulean) wild-type cardiomyocytes 
(arrowheads) can activate myocardial Notch activity (Tp1:d2GFP) in 
neighbouring erbb2 MO Tg(Tp1:d2GFP; myl7:H2A-mCherry) host 
cardiomyocytes (arrows, n= 10). Scale bar, 25 jim. 


dimethylsulfoxide (DMSO) from 60 to 72 hpf. DAPT treatment leads 
to increased numbers of trabeculating clones and conversely decreased 
numbers of non-trabeculating clones compared with DMSO-treated 
hearts; however, the total number of ventricular cardiomyocyte clones 
is not significantly different (Fig. 2r-t), further supporting the idea that 
Notch signalling segregates individual cardiomyocyte clones between 
the ventricular outer wall and inner trabecular layers. 

To examine whether Notch signalling acts cell-autonomously 
to control cardiomyocyte sprouting, we perturbed Notch signal- 
ling in individual cardiomyocytes during trabeculation by injecting 
hsp70I:loxp-mCherry-STOP-loxp-NICD-P2A-Emerald (hsp70I:RSN, 
Notch activating) or hsp701:loxp-mCherry-STOP-loxp-dnSuH- 
P2A-Emerald’* (hsp701:RSdnS, dominant negative Suppressor of 
Hairless/Notch repressing) switch plasmids into Tg(myl7:Cerulean)"*; 
Tg(myl7:Cre) zebrafish embryos (Fig. 2u). Heat-shocking these 
injected fish from 60 to 72 hpf resulted in most Notch-activated 
NICD-P2A-Emerald* cardiomyocytes remaining in the ventricular 
outer myocardial wall (Fig. 2w, z), whereas Notch-inhibited dnSuH- 
P2A-Emerald* cardiomyocytes reside primarily in trabeculae 
(Fig. 2y, z). Heat-shocking injected control fish lacking Tg(myl7:Cre) 
generated mCherry* cardiomyocytes that were distributed equally 
between both myocardial layers (Fig. 2v, x, z), altogether revealing a 
myocardial cell-autonomous role for Notch signalling. 

Because Neuregulin/Erbb2 and BMP10 signalling can promote 
cardiac trabeculation*!°!”, we investigated whether myocardial 
Notch may cross-talk with these signalling pathways to regulate 
cardiomyocyte selection between the ventricular wall layers. Inhibiting 
Erbb2 signalling with AG1478 treatment"? from 60 to 72 hpf prevents 
cardiac trabeculation and expression of Tp1:d2GFP in cardiomyocytes, 
although Tp1:d2GFP remains expressed in AV and OFT endocardial 
cells (Extended Data Fig. 9a, b). Consistent with these findings, both 
erbb2 morpholino (MO) knockdown and erbb2~/~ mutant (erbb2°) 
Te(Tp1:d2GFP) embryos, which exhibit similar trabecular defects'®!8, 
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Figure 4 | The Notch ligand Jag2b mediates cooperative interactions 
between cardiomyocytes. ad, Ventricular myocardial (MF20*) jag2b 

is expressed in (a, b) wild-type (WT) (n= 6/6 embryos) but not (c, d) 
erbb2~/~ mutant hearts (n = 0/5). e-h, Compared with (e, f) WT 

controls (1 =0/10), (g, h) Tg(Tp1:d2GFP; myl7:mCherry) jag2b~'~ 
mutants exhibit increased trabeculation and reduced myocardial Notch 
signalling at 72 hpf (m= 8/8). Yellow arrowheads, jag2b” cardiomyocytes; 
white arrowheads, trabeculae; white arrows, Tp1:d2GFP* cardiomyocytes; 
white and yellow asterisks, AV and OFT. Scale bar, 25 ,1m. Myocardial 
Notch signalling model: (i) endocardial Neuregulin/Nrg1 activates 
myocardial Erbb2 signalling, which cell-autonomously triggers myocardial 
sprouting and Jag2b expression (60-72 hpf). Jag2b activates Notch 
signalling in neighbouring cardiomyocytes, which cell-autonomously 
inhibits erbb2 expression and trabeculae formation (magnified area). 

j, Inhibiting Notch signalling allows all cardiomyocytes to express erbb2, 
respond to Neuregulin, and sprout and form trabeculae. k, Blocking Erbb2 
signalling prevents trabeculation, Jag2b expression, and Notch activation 
in neighbouring cardiomyocytes. 
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also fail to display Notch reporter expression in the myocardium 
(Fig. 3a, b, d), supporting a requirement for Erbb2 signalling in the 
initiation of trabeculation and the activation of myocardial Notch sig- 
nalling. Notably, neither DAPT treatment nor heat-shock induction 
of dnMAML between 60 and 72 hpf, which alone increases trabec- 
ulation (Fig. 2), could rescue the relative lack of cardiac trabeculae 
in zebrafish with loss of Erbb2 function (Fig. 3c, e and Extended 
Data Fig. 9b-e). In contrast to the Erbb2 loss of function findings, 
embryos treated with Dorsomorphin'? from 60 to 72 hpf to inhibit 
BMP signalling still form cardiac trabeculae and express Tp1:d2GFP 
in the outer myocardial wall (Extended Data Fig. 4m-p) despite the 
abrogation of the BMP-reporter signal (BRE:d2GFP) in the trabec- 
ular layer (Extended Data Fig. 4j-1). However, by 7 dpf, these hearts 
display aberrant and stunted cardiac trabeculae compared with 
DMSO-treated fish (Extended Data Fig. 4q-s), corroborating a 
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requirement for BMP signalling in the maintenance but not the 
initiation of cardiac trabeculation”®’. 

To explore whether Notch activation negatively regulates Erbb2 
signalling to prevent trabeculae formation, we examined erbb2 expres- 
sion in 72 hpf Tp1:d2GFP hearts and discovered that erbb2 is expressed 
in many ventricular cardiomyocytes but diminished in Tp1:d2GFP* 
cardiomyocytes (Extended Data Fig. 9f-j). In support of these find- 
ings, constitutive myocardial Notch activation by heat-shocking 
Tg(myl7:Cre; hsp70I:RSN) fish between 60 and 120 hpf results in the 
dramatic reduction of erbb2 myocardial expression (Extended Data 
Fig. 9k, 1, 0, p). In contrast, Notch-inhibited hearts treated with DAPT 
from 60 to 72 hpf exhibit increased erbb2 myocardial expression 
(Extended Data Fig. 9m, n, q, r). Thus, myocardial Notch signalling 
may block Neuregulin/Erbb2 signalling by downregulating erbb2 
expression to inhibit cardiomyocyte sprouting. 

Since Notch signalling has been shown to mediate cell fate posi- 
tion through lateral inhibition mechanisms®”!”’, we investigated 
whether Erbb2 signalling non-cell-autonomously activates myo- 
cardial Notch signalling in neighbouring cardiomyocytes. Thus, we 
created mosaic embryos by transplanting Tg(myl7:Cerulean) wild-type 
donor blastomeres into erbb2 or control MO injected Tg(Tp1:d2GFP); 
Tg(myl7:H2A-mCherry)** host embryos and assessed the ability 
of wild-type donor cells to contribute to the ventricular wall layers 
and activate myocardial Notch signalling. As previously reported!°, 
a greater percentage of donor-derived wild-type cardiomyocytes is 
present in the trabeculae of erbb2 knockdown embryos compared 
with control embryos (compare Fig. 3i with Fig. 3g; Extended Data 
Fig. 10a). Although non-transplanted erbb2 knockdown hearts fail 
to exhibit myocardial Notch activity (Fig. 3f, h), transplanted erbb2 
knockdown host hearts containing wild-type donor myocardial cells 
(myl7:Cerulean*) can activate myocardial Tp1:d2GFP expression 
(Fig. 3i and Extended Data Fig. 10b). Upon closer inspection, these 
host erbb2 knockdown Tp1:d2GFP* cardiomyocytes (Fig. 3i, arrows) 
appear adjacent to donor wild-type myl7:Cerulean* cardiomyocytes 
(Fig. 3i, arrowheads, and Extended Data Fig. 10c), supporting a role 
for Erbb2-responsive cardiomyocytes in activating Notch signalling in 
neighbouring cardiomyocytes. 

On the basis of these results, we searched for potential Notch ligands 
mediating the activation of Notch signalling in neighbouring cardio- 
myocytes and discovered that jag2b is expressed in select ventricular 
cardiomyocytes at 72 hpf when myocardial Notch signalling is activated 
(Fig. 4a, b). This ventricular myocardial jag2b expression is reduced in 
erbb2~'~ mutant hearts (Fig. 4c, d), suggesting that Erbb2 signalling 
may activate Notch signalling in neighbouring cardiomyocytes through 
jag2b. In support of this possibility, we discovered that jag2b~/~ mutant 
hearts exhibit not only increased trabeculation as observed in Notch- 
inhibited hearts but also reduced Tp1:d2GFP Notch reporter activity 
in the ventricular myocardium but not in the AV or OFT endocardium 
(Fig. 4e-h). Together these data support a model in which myocardial 
Erbb2 signalling non-cell-autonomously activates Notch signalling in 
neighbouring ventricular outer-wall cardiomyocytes through Jag2b, 
which in turn leads to the reduction of erbb2 expression, subsequent 
inhibition of Erbb2 signalling, and suppression of cardiomyocyte 
sprouting (Fig. 4i-k). 

Overall, these findings reveal a molecular mechanism whereby Notch 
and Erbb2 signalling coordinates social cellular interactions between 
cardiomyocytes that determine their morphological fate within the 
ventricular wall. Although previous studies have suggested that Notch 
signalling may be activated in the myocardium**”’, our zebrafish studies 
illuminate the precise role of myocardial Notch activity in forming 
the ventricular wall. Similar to the receptor tyrosine kinase (RTK)- 
Notch lateral inhibition signalling mechanisms that regulate epithe- 
lial tip and stalk cell formation during branching morphogenesis®”, 
myocardial Notch acts in concert with the RTK Erbb2 to segregate 
embryonic cardiomyocytes into two functionally distinct classes of 
cells: (1) sprouting cardiomyocytes that respond to Neuregulin via 
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Erbb2 and (2) non-sprouting Notch-activated cardiomyocytes, 
in which Notch signalling inhibits erbb2 expression. These roles appear 
not to be pre-specified, but rather are determined by social interactions 
between cardiomyocytes. Furthermore, recent studies have reported 
human Notch genetic variants linked to a wide spectrum of congenital 
heart diseases including non-compaction cardiomyopathies**”°, which 
exhibit similar severe ventricular wall defects to those observed in our 
Notch studies. More broadly, our studies support a conserved role for 
intercellular cross-talk between RTKs and Notch signalling for allocat- 
ing cells within organ substructures and might be particularly relevant 
in developing strategies for human pluripotent stem-cell tissue-specific 
developmental and disease modelling or regenerative therapies. 
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Source Data, are available in the online version of the paper; references unique to 
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METHODS 


Zebrafish husbandry and strains. Zebrafish (Danio rerio) were raised under 
standard laboratory conditions at 28°C. All animal work was approved by 
the University of California at San Diego Institutional Animal Care and Use 
Committee. The following established transgenic and mutant lines were used: 
Tg(EPV.Tp1-Mmu.Hbb:d2GFP)""® (ref. 8) abbreviated as Tg(Tp1:d2GFP); 
Te(EPV.Tp1-Mmu.Hbb:eGFP)""""4 (ref, 12) abbreviated as Tg(Tp1:eGFP); Tg(BRE- 
AAVilp:d2GFP)""”? (ref. 19) abbreviated as Tg(BRE:d2GFP); Tg(hsp70l:dnMAML- 
GFP)*!° (ref. 13) abbreviated as Tg(hsp70l:dnM); Tg(kdrl:ras-mCherry)°°6 
(ref. 30); Tg(myl7:H2A-mCherry) (ref. 23); Tg(myl7:mCherry)*“” (ref. 31); 
Tg(myl7:Cerulean)®"? (ref. 16); T'g(myl7:eGFP-HRAS)*** (ref. 32) abbreviated 
as Te(myl7:ras-eGFP); Tg(myl7:CreER)*"” (ref. 33); Tg((3-act2:Brainbow1.0L)P4? 
(ref. 4) abbreviated as Tg(priZm); Tg(hsp70l:loxp-mCherry-STOP-loxp-NICD- 
P2A-Emerald)S**! (ref. 14) abbreviated as Tg(hsp70I:RSN); Tg(3-act2:loxP-DsRed- 
STOP-loxP-eGFP)°**”* (ref. 33) abbreviated as Tg (ref. G-act2:RSG); erbb2°° 
(ref. 18), and jag2b!">> (ref. 34). 

To generate the Tg(myl7:Cre)“** transgenic line, a 900-base-pair fragment of 
the myl7 promoter* was cloned upstream of the Cre recombinase gene into a 
multi-cloning site flanked by I-Scel sites in the pBluescript-SK vector. Standard 
I-Scel meganuclease transgenesis*° was used to create transgenic founders 
which were screened for myocardial Cre recombinase activity by crossing to 
the Te(3-act2:RSG)°8 line. Three independent founders were identified, all 
with similar levels of Cre recombinase activity and matching the expression of 
Tg(myl7:Cerulean)°!? (Extended Data Fig. 5a-d). A single representative founder 
was propagated further. 

To generate the Tg(ubi:loxp-mKate2-STOP-loxp-dnMAML-GFP)””* strain, 

abbreviated as Tg(ubi:RSdnM), gateway cloning technology (Life Technologies) 
was used to conduct an LR recombination reaction with the pENTR5/_ubi*’, pME- 
loxp-mKate2-STOP-loxp, p3E-dnMAML-GFP entry vectors and the pDESTol2pA2 
destination vector**, The pME-loxp-mKate2-STOP-loxp entry vector was created 
by replacing the AmCyan complementary DNA in the pME-loxp-AmCyan- 
STOP-loxp*® vector with a complementary DNA encoding mKate2 (Evrogen) 
using In-Fusion HD cloning (Clontech Laboratories). The p3E-dnMAML-GFP 
entry vector was generated by conducting a BP recombination reaction between 
a PCR product encoding a fusion protein between dnMAML and GFP amplified 
from pME-dnMAML-GFP* and the Gateway Donor Vector pDONRP2R-P3. 
Altogether the ubi:loxp-mKate2-STOP-loxp-dnMAML-GFP construct was 
co-injected with Tol2 transposase mRNA* into one-cell stage embryos to generate 
independent founders which were screened for mKate2 and then GFP upon 
Cre-mediated recombination. Founders with both mKate2 and GFP were prop- 
agated further. 
Embryonic immunofluorescence and live imaging studies. Wholemount 
immunofluorescence studies were performed as previously described", with the 
following modifications. After initial fixation, any pre-existing fluorescence was 
quenched by incubating embryos in 2M HCl at 37°C for 30 min and washing with 
double-distilled H,O and phosphate buffer saline with 0.1% Tween-20 (PBST). The 
antibodies used were anti-Mef2/C-21 (rabbit, Santa Cruz Biotechnology, 1:100), 
anti- MHC/MF20 (mouse, Developmental Studies Hybridoma Bank, 1:100) or 
anti-N-cadherin (rabbit, GeneTex, 1:100) followed by anti-rabbit IgG-Alexa 488 
(goat, Life Technologies 1:200). 

For embryonic studies of Notch activity, embryos containing the Notch reporters 
Tg(Tp1:d2GFP)""® or Tg(Tp1:eGFP)""" in combination with myocardial- 
expressed transgenes such as T'g(myl7:mCherry)*” or endothelial expressed trans- 
genes such as Tg(kdrl:ras-mCherry)°®° were imaged live’. The Tp1 promoter 
used in these Notch reporter transgenics consists of 12 RPBJ-binding sites and 
reports Notch activity throughout the embryo as previously published*!*. These 
embryos were embedded in 1% low melting agarose (Lonza) in a coverslip bottom 
culture dish (MatTek) and cardiac contraction was arrested using Tricaine-S (Sigma 
MS-222) just before imaging. 

To count the number of Tp1:d2GFP* clusters in each heart, we used three- 
dimensional reconstructions (Nikon NIS Elements software) from confocal stacks 
of Tg(Tp1:d2GFP; myl7:mCherry) embryos (5-16 hearts per stage). To count the 
number of cardiomyocytes per Tp1:d2GFP* cluster, Tg(Tp1:d2GFP; myl7:H2A- 
mCherry) embryos were used. Only cells expressing both H2A-mCherry and 
d2GEP were counted. We analysed five to ten hearts per stage to obtain the average 
number of cardiomyocytes in each Tp1:d2GFP cluster at each specified stage. 
Statistical analysis is described in the ‘Image processing and statistical analysis’ 
section below. 

To assess the correlation between nascent trabeculae and Tp1:d2GFP* 
clusters at 72 hpf, trabeculae were identified in consecutive slices from a Z-stack and 
pseudo-coloured with magenta. A three-dimensional reconstruction was then 
used to generate the full view of the ventricle, where the number of nascent 
trabeculae and Tp1:d2GFP* clusters could be counted in each heart. These data 
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are represented in a scatter plot (Extended Data Fig. 1r) and used to overlay a 
linear regression line. 

Adult immunofluorescence and imaging studies. Immunofluorescence studies 
were conducted on cryosections of adult zebrafish hearts. These hearts were 
cryoprotected, mounted, sectioned, and stained as performed previously*’. The 
following primary antibodies were used: anti- MHC/MEF20 (mouse, Developmental 
Studies Hybridoma Bank, 1:100); anti-c-actinin (mouse, Diagnostic BioSystems, 
1:100); anti-Raldh2 (rabbit, Abmart, 1:100); and anti-GFP (chicken, Aves Labs, 
1:200). The following secondary antibodies were used: anti-mouse IgG-Alexa 405 
(goat, Life technologies, 1:200), anti-mouse IgG-Alexa 594 (goat, Life Technologies, 
1:200), anti-rabbit IgG-Alexa 568 (goat, Life Technologies, 1:200) and anti-chicken 
IgG-Alexa 488 (goat, Life Technologies, 1:200). Alexa Fluor 594-conjugated wheat 
germ agglutinin (Life Technologies, 50,.g ml‘) was used to stain the extracellular 
matrix. DAPI (11g ml!) staining was used to identify nuclei. Notably, we discov- 
ered that eGFP from the T(Tp1:eGFP)“"" transgene perdured for a longer period 
in the ventricular outer myocardial wall (Extended Data Fig. 3) than d2GFP from 
the Tg(Tp1:d2GFP)”"” transgene (Extended Data Fig. 1). 

Notch signalling studies. Notch inhibition studies were performed using DAPT 
(a chemical inhibitor of >)-secretase) or d(nMAML mis-expression (dominant 
negative mastermind-like 1). DAPT: zebrafish embryos were incubated in 100|1M 
DAPT (Sigma) or 0.1% DMSO alone (control) at specified developmental 
stages and time intervals and then quickly washed (two or three times) with 
egg water (60,1g ml! Instant Ocean sea salts) for further analysis. The ability 
of 100|.M DAPT treatment to inhibit Notch signalling was validated by exam- 
ining Tp1:d2GFP expression after DAPT treatment (Extended Data Fig. 2i-p). 
dnMAML: The Tg(hsp70l:dnM) was used to globally express d(aMAML at specified 
time points. Heat-shock induction was conducted by placing Tg(hsp70l:dnM) or 
wild-type siblings into a 37°C incubator for 30 min, followed by 3 min in a 42°C 
water bath. Embryos were heat-shocked twice every 24h to maintain the induction 
of dnaMAML-GFP throughout the embryo. This protocol was highly efficient at 
inducing d(nMAML-GFP expression and produced minimal lethality. To inhibit 
Notch signalling in cardiomyocytes only, the Tg(ubi:loxp-mKate2-STOP-loxp- 
dnMAML-GEP) line was crossed with the Tg(myl7:Cre) line to produce embryos 
which express d(nMAML-GFP only in the myocardium. Induction of dnMAML- 
GFP was verified by examining GFP fluorescence 5-6h after heat shock or 
Cre-mediated recombination”. 

Notch activation was performed by expressing NICD. Heat shocking embryos 
containing both Te(hsp70l:loxp-mCherry-STOP-loxp-NICD-P2A-Emerald)"* and 
Tg(myl7:Cre) transgenes produced NICD-P2A-Emerald only in the myocardium. 
Heat shock was performed as described above. 

Ventricular wall thickness was measured to quantify the effect of perturbing 
Notch signalling and was determined by drawing five representative lines 
perpendicular to the ventricular wall in a representative confocal slice. Thickness 
was measured as the distance along the line between the lateral and medial edge 
of the myocardial wall. All hearts were imaged in the same orientation and 
comparable confocal slices were chosen for analysis. Six hearts were measured 
for each condition. 

To determine the effect of altering Notch signalling on cardiomyocyte cell 
numbers within the ventricular outer wall and trabecular layers, ventricular 
cardiomyocyte nuclei were counted from hearts exposed to specified experi- 
mental conditions using three-dimensional reconstructions of confocal slices 
from embryos with myl7:H2A—mCherry or from embryos stained with the Mef2 
antibody. Mef2 immunostaining was used in embryos containing transgenes 
with fluorophores that overlapped with H2A-mCherry, such as ubi:RSdnM 
or hsp70I:RSN. For these analyses, the cells within the trabeculae could be 
separated from the ventricular outer wall using the post-image processing 
procedure described in the ‘Image processing and statistical analysis’ section below. 
Using this procedure, we calculated the number of cardiomyocyte nuclei in the 
total ventricle and the number of cardiomyocyte nuclei within the trabeculae. 
The number of cardiomyocyte nuclei in the ventricular outer wall was calculated 
by subtracting the number of cardiomyocytes in the trabeculae from the total. 

Trabeculae area was measured from a confocal slice of a ventricle containing 
a cytoplasmic fluorophore such as myl7:mCherry or ubi:RSdnM or hsp70I:RSN. 
Confocal slices at the level of the AV canal were analysed. Non-trabecular tissue 
in these images was masked manually and then the total number of fluorescent 
pixels was measured using the IDL program (Research Systems). All images were 
taken at the same dimensions. Ventricle area was determined by measuring the 
total pixels outlined in the ventricle region using Image] software. 

Clonal analysis. Cardiomyocyte clones were genetically labelled by combining the 
myl7:CreER and priZm ((3-act2:Brainbow1.0L)* transgenes and then treating with 
4-hydroxytamoxifen (4-HT, Sigma). Specifically, zebrafish embryos with these 
transgenes were treated at 48 hpf, when the zebrafish heart consists of a single 
cardiomyocyte thick wall and is looped but has not initiated cardiac trabeculation, 
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with 101M 4-HT or 0.1% ethanol (control) for 6h at 28°C and then washed with 
fresh egg water several times. The dose and length of incubation of 4-HT was 
titrated to create small distinct clones (one or two cells) before trabeculation 
(Extended Data Fig. 8). The total numbers of cardiomyocyte clones were counted 
from three-dimensional reconstructions of confocal slices from hearts containing 
myl7:CreER and priZm transgenes. Visualization and counting of clones solely 
within the trabeculae were analysed using the post-imaging processing procedure 
described in the ‘Image processing and statistical analysis’ section below. 

Mosaic analysis by DNA injection. To create the hsp70I:loxp-mCherry-STOP- 
loxp-dnSuH-P2A-Emerald plasmid (abbreviated as hsp70I:RSdnS), the dominant 
negative Suppressor of Hairless'> (dnSuH) DNA construct was PCR amplified with 
flanking Ascl and SaclI restriction sites. After sequence verification, this dnSuH 
product was subcloned into the hsp701:loxp-mCherry-STOP-loxp-NICD-P2A- 
Emerald‘ construct (hsp70I:RSN), replacing the NICD sequence and generating 
the hsp70l:loxp-mCherry-STOP-loxp-dnSuH-P2A-Emerald (hsp70l:RSdnS) con- 
struct for subsequent injection studies (see below). To generate cardiomyocyte 
clones with constitutively activated or inhibited Notch signalling, the I-Scel enzyme 
was co-injected with either the hsp70I:RSN plasmid (25 pg) or the hsp701:RSdnS 
(25 pg) plasmid into one-cell stage embryos containing Tg(myl7:Cre; myl7:Ceru- 
lean) or only Tg(myl7:Cerulean). Embryos were then heat-shocked (as described 
above) at 60 hpf and imaged at 72 hpf. Cardiomyocytes containing either plasmid 
were detected by the co-expression of mCherry or Emerald and Cerulean. The 
location of Notch-altered Emerald* cardiomyocytes clones in either the ventricular 
outer wall or the trabeculae was determined using the method described within 
the ‘Image processing and statistical analysis’ section below. 

Erbb2 and BMP loss of function studies. The activity of Erbb2, a tyrosine kinase 
receptor, was inhibited using (1) homozygous erbb2"° mutants!®, (2) a splice mor- 
pholino targeting erbb2 (erbb2 MO)", or (3) the tyrosine kinase inhibitor AG1478 
(Calbiochem). (1) The erbb2"*? homozygous mutant embryos were identified by 
the previously characterized aberrant cardiac morphology"®. (2) The erbb2 MO was 
previously characterized and shown to be specific!®. We injected 570 pg of the erbb2 
morpholino (erbb2 MO) or a mismatched control morpholino (control MO) into 
one-cell stage embryos as previously described”. (3) AG1478 (51M; Calbiochem) 
or 0.1% DMSO (control) was added to embryos as previously described’. After 
incubation, embryos were washed extensively with egg water for further analysis. 
The erbb2 MO injections or AG1478 incubations phenocopied the erbb2°*? mutant 
phenotype (Fig. 3 and Extended Data Fig. 9). 

To investigate the relationship between Notch signalling and erbb2, erbb2 MO 
embryos or erbb2"? mutant embryos were incubated with 1001M DAPT (Sigma) 
or 0.1% DMSO at 60 hpf as described above. In complementary experiments, 
Tg(hsp70l: d(aMAML-GFP) embryos were injected with the erbb2 MO and heat- 
shocked at 60 hpf as described above. 

To inhibit BMP signalling, embryos were incubated with 301M of 
Dorsomorphin (Sigma) as previously described“. The efficacy of Dorsomorphin 
was verified by examining its effect on the BMP reporter, Tg(BRE:d2GFP). 

Cell transplantation studies. Blastomere transplantation was performed at the 
mid-blastula stage as previously described. Ten to twenty cells were removed 
from mid-blastula donor Tg(myl7:Cerulean) embryos and placed along the 
margin of either control MO or erbb2 MO host Tg(Tp1:d2GFP; myl7:H2A- 
mCherry) embryos. Transplanted embryos in which donor cells contributed to the 
heart were imaged at 72 hpf. Image analysis of whether donor cells contributed to 
the ventricular outer wall or trabeculae and their location relative to Tp1:d2GFPt 
cells was assessed in single confocal slices or in three-dimensional reconstructions 
using the post-imaging procedures described within the ‘Image processing and 
statistical analysis’ section below. 

In situ hybridization expression analyses. Fluorescent in situ hybridization studies 
of erbb2 and jag2b were performed as described in the ViewRNA in situ hybrid- 
ization 1-Plex kit protocol (Affymetrix Panomics) with the following modifica- 
tions. After the initial fixation, existing fluorescence was quenched as described for 
embryonic immunofluorescence above. Embryos were then subjected to protease 
digestion (protease QF, 1:100) at 40°C for 30 min, followed by PBST washes and 


re-fixation in 4% PFA at 25°C for 20 min. After additional PBST washes, hybridi- 
zation was performed with erbb2 probes (Affymetrix Panomics VF1-16871, 1:50 
dilution) or jag2b probes (Affymetrix Panomics VF1-18462, 1:50 dilution) at 40°C 
overnight. Hybridized embryos were then washed in PBST and stepped through 
pre-labelling solutions. Embryos were then incubated with label probe-AP solution 
(1:1,000) for 30 min at 40°C, washed again, transferred to a AP-enhancer solution 
and then transferred to fast red solution (one fast red substrate tablet in 5 ml 
naphthol buffer) for 30 min at 40°C. After PBST washes, embryos were incubated 
in anti-MHC/MEF20 antibody (mouse, Developmental Studies Hybridoma Bank, 
1:100) or anti-GFP antibody (chicken, Aves Labs, 1:200) overnight at 4°C. After 
PBST washes, embryos were incubated in anti-mouse IgG-Alexa 488 (goat, 
Life Technologies, 1:200) or anti-chicken IgG-Alexa 488 antibodies (goat, Life 
Technologies, 1:200) for 1h at 25°C. Finally, embryos were washed and mounted 
for imaging. PBST washing consisted of three washes for 15 min each at 25°C. For 
gfp mRNA expression analysis, wholemount in situ hybridization studies were 
performed as previously described”. 

Image processing and statistical analysis. All images were obtained using a 
Nikon C2 confocal microscope and processed using Nikon NIS Elements soft- 
ware and ImageJ as previously described“. Scale bars for all images represent 
25 \1m. Measurements comparing the ventricular outer wall with the trabeculae 
were performed with post-image processing of confocal slices. Visualization of all 
cardiomyocytes and clones within the ventricle (comprising both the ventricular 
outer wall and the trabeculae) were made using three-dimensional reconstructions 
(Nikon NIS Elements) of confocal slices. However, to visualize only the trabecu- 
lae, cardiomyocytes within the ventricular outer wall in individual confocal slices 
were identified by their outer location and orientation and then masked manually. 
Three-dimensional reconstructions with these masked confocal slices then allowed 
the visualization and measurement of trabeculae alone. Measurements for the 
ventricular outer wall alone were calculated by subtracting the measurements of 
the trabeculae from the total ventricular cardiomyocytes. No statistical methods 
were used to predetermine sample size. Animals were assigned to experimental 
groups using simple randomization, without investigator blinding. Unpaired 
two-tailed Student's t-tests or Fisher’s exact tests were used to determine statistical 
significance. P < 0.05 was considered to be statistically significant, as indicated by 
an asterisk. Error bars, s.e.m. 
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Extended Data Figure 1 | Notch signalling is dynamically activated 
in the endocardium and myocardium during heart development. 
a-f, Confocal slices of Tg(Tp1:d2GFP; myl7:mCherry) hearts reveal that 
Notch signalling is in the ventricular endocardium (yellow arrows) but 
not in the myocardium at 24 hpf (n = 11) and 36 hpf (n = 8), but (g-i) 
becomes restricted to the AV and OFT endocardium by 48 hpf (n= 12). 
j-o, Tg(Tp1:d2GFP; kdrl:ras-mCherry) confocal imaging confirms 

that Tp1:d2GFP is expressed in the ventricular endocardium at (j-l) 

24 hpf (n= 8) but becomes localized to the AV or OFT endocardium 

as well as non-endocardial cells in the outer ventricular myocardial 
wall (white arrows) by (m-o) 96 hpf (7 = 10). p, q, Three-dimensional 
confocal reconstructions of the (p) exterior and (q) interior regions of 
72 hpf Tg(Tp1:d2GEP; myl7:mCherry) hearts reveal that Notch-activated 
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Trabeculae per heart 


Tp1:d2GFP* cells are present in cardiomyocyte clusters (green, numbers 
in parentheses) and excluded from nascent cardiac trabeculae (pseudo- 
colour magenta, numbers). r, Graph shows that the number of cardiac 
trabeculae (x axis) and Tp1:d2GFP* cardiomyocyte clusters (y axis) 

are similar within the ventricle (n = 30) at 72 hpf. Size of dots indicates 
the number of embryos with a particular number of trabeculae and 
Tp1:d2GFP* clusters. Line represents a linear regression fitted to the data. 
s, t, Myocardial anti- MHC/MF20 immunostaining of Tg(Tp1:d2GFP) 
hearts reveals a loss of myocardial Tp1:d2GFP Notch reporter signal 

at 30 and 90 dpf hearts (n=5 hearts per stage). White arrows, likely 
Tp1:d2GFP* cardiomyocytes; yellow arrows, Tp1:d2GFP* endocardial 
cells; white and yellow asterisks, AV and OFT. Dashed line in s outlines 
ventricle. V, ventricle; A, atrium. Scale bar, 251m. 
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Extended Data Figure 2 | DAPT treatment validates that the Notch 
reporter Tp1:d2GFP monitors dynamic Notch signalling more closely 
than Tp1:eGEP, and reveals opposing roles of Notch signalling on 
trabeculation at different developmental stages. a—d, At 48hpf, 

(a) Tp1:d2GFP expression is restricted to the AV and OFT endocardium 
(n= 8/8 embryos) whereas (c) Tp1:eGFP is expressed in the ventricular, 
AV and OFT endocardium (n= 6/6). However, gfp mRNA is primarily 
expressed in the AV and OFT regions in both (b) Tg(Tp1:d2GFP; 
myl7:mCherry) (n= 10/10) and (d) Tg(Tp1:eGFP; myl7:mCherry) 
embryos (n = 5/5), revealing that Tp1:d2GFP expression most closely 
matches Notch reporter activity. e-h, After 24h DAPT treatments of (e, f) 
Tg(Tp1:d2GFP; myl7:H2A-mCherry) and (g, h) Tg(Tp1l:eGFP; myl7:H2A- 
mCherry) embryos at 72 hpf, (f) Tp1:d2GFP is more diminished 
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throughout the heart at 96 hpf (nm = 8/10) compared with (h) Tp1:eGFP 
(n=6/7), confirming Tp1:d2GFP signal more faithfully recapitulates 
Notch signalling dynamics. m-p, Tg(Tp1:d2GFP; myl7:mCherry) 
hearts DAPT-treated from 60 to 72 hpf exhibit increased trabeculation 
(white arrowheads) and diminished Tp1:d2GFP Notch reporter activity 
(n= 12/16) than (i-l) DMSO-treated hearts (n = 0/20). However, 

(r) Tg(myl7:mCherry) hearts DAPT-treated from 20 to 48 hpf exhibit 
reduced trabeculae at 120 hpf (n= 12/15) than (q) DMSO-treated 
hearts (n = 0/20). s, Graph represents trabeculae/total ventricular area 
in embryos treated with DMSO or DAPT in q and r. White and yellow 
arrows, myocardial and endocardial Notch reporter activity; white 
arrowheads, trabeculae; white and yellow asterisks, AV and OFT. 

Scale bar, 251m. Mean + s.e.m. *P < 0.05 by Student's t-test. 
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Extended Data Figure 3 | Tp1:eGFP labels the ventricular outer wall 
during early cardiac development, which becomes the distinctive 
ventricular primordial myocardium in adults. Using the Tp1:eGFP 
Notch reporter, which exhibits greater fluorescence perdurance than 
Tp1:d2GFP, we performed limited fate mapping of Notch activated cardiac 
cells during ventricular morphogenesis. a, b, Tp1:eGFP is expressed not 
only in ventricular cardiomyocytes (red nuclei, white arrows) at 72 hpf 
but also throughout the ventricular endocardium because of eGFP 
perdurance (yellow arrows) (n = 12). ¢, d, Although diminishing in the 
ventricular endocardium (yellow arrows) at 96 hpf (n = 14), Tp1:eGFP 
expands in the outer ventricular myocardial wall (white arrows), 

yet is notably absent from myocardial trabeculae (white arrowheads). 

e, f, By 30 and 45 dpf (n= 6, n=5), Tp1:eGFP remains in the peripheral 
ventricular (primordial) myocardial layer, which is one cardiomyocyte 
thick (myl7:H2A-mCherry*/red and MF20*/blue), but is reduced in the 
ventricular but not the AV or OFT endocardium. g-i, At 60 dpf(n=5), 
(h) new cardiomyocytes (cortical layer, yellow arrowheads) form over 
the Tp1:eGFP* primordial myocardium (white arrows) at the ventricular 
myocardial base (yellow box in g) and extend towards the apex where 

(i) Tp1:eGFP* cardiomyocytes (white arrows) still remain the outer most 
layer of the ventricular myocardium (white box in g). j, However, by 

90 dpf (n=5), this new cortical myocardial layer (yellow arrowheads) 
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spreads over the apical Tp1:eGFP* ventricular primordial myocardium 
(white arrows). k-m, In adult hearts (90 dpf), Tp1:eGFP is primarily 
found in the (k, n=5) myl7;7H2A-mCherry™ primordial myocardium 
but not in the (1, n=5) endocardium marked by kdrl:ras-mCherry, nor 
(m, m= 3) epicardium marked by Raldh2 localization. n-t, Adult hearts 
(6 months) were further examined to assess the cellular attributes of 

the primordial layer. n, Anti-MHC/MF20 immunostaining confirms 
that Tp1:eGFP* cardiac cells are myocardial (n=5). 0, Anti-c-actinin 
immunostaining reveals that trabecular (white arrowheads) and cortical 
(yellow arrowheads) cardiomyocytes display organized sarcomeric 
structures but the Tp1:eGFP* primordial cardiomyocytes (arrows) do not 
(n=7). p-t, Wheat germ agglutinin (WGA) staining shows that (p, q) 
the Tp1:eGFP* primordial myocardial layer is surrounded by extensive 
extracellular matrix (n = 5) and that (r-t) Tg(myl7:ras-eGFP) primordial 
cardiomyocytes display a thin cellular morphology compared with other 
ventricular cardiomyocytes (n= 10). q, An X-Z reconstruction of confocal 
stacks from Tp1:eGFP and wheat germ agglutinin stainings at the dashed 
line shown in p. b, d, h-i, t, Magnifications of the boxed areas in a, ¢, g, 
s, respectively. White and yellow arrows, myocardial and endocardial 
Tp1:eGFP; white and yellow arrowheads, trabeculae and cortical layer; 
white and yellow asterisks, AV and OFT. Scale bar, 25 jum. 
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Extended Data Figure 4 | BMP signalling, which marks trabeculae, 
is required for expanding but not initiating trabeculae formation 
and has no effect on myocardial Notch activity. a-l, Tg(BRE:d2GFP; 
myl7:mCherry) hearts were treated with (a-c) DMSO, (d-f) DAPT, 
(g-i) AG1478, or (j-l) Dorsomorphin at 60 hpf and imaged at 72 hpf. 
a-c, DMSO-treated hearts express the BRE:d2GFP BMP reporter in 
trabeculae (arrowheads) and in the AV myocardium (yellow arrows, 
n=11/11 embryos). d-f, DAPT-treated hearts exhibit increased 
trabeculation and BRE:d2GFP expression in these forming trabeculae 
(arrowheads, n = 9/12). g-i, AG1478-treated hearts fail to form trabeculae 
(n= 9/10) and only express the BRE:d2GFP BMP reporter in the AV 
myocardium (yellow arrow). j-1, Dorsomorphin-treated hearts form 
cardiac trabeculae (arrowheads) but fail to express the BRE:d2GFP 
BMP reporter in both cardiac trabeculae and the AV myocardium 
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(n=10/12). m-p, Treating Tg(Tp1:d2GFP; myl7:mCherry) embryos with 
Dorsomorphin from 60 to 72 hpf did not affect the initiation of trabeculae 
(arrowheads) nor the activation of myocardial Notch signalling (white 
arrows, n= 13/16) compared with treating with DMSO (see Extended 
Data Fig. 2i-l). q, r, Although Tg(myl7:mCherry) hearts treated with (q) 
DMSO or (r) Dorsomorphin from 60 hpf to 7 dpf form similar numbers of 
trabeculae (arrowheads), Dorsomorphin-treated hearts display trabeculae 
that are stunted/reduced in size (n = 12/15) compared with DMSO-treated 
control hearts (n = 0/15). s, Graph reveals a significant reduction in the 
trabecular/ventricular area ratio in Dorsmorphin-treated fish compared 
with DMSO-treated controls. Arrowheads, trabeculae; yellow arrows, 

AV myocardium; white arrows, Tp1:d2GFP* myocardium. White 
asterisks, AV. Mean + s.e.m. *P < 0.05 by Student’s t-test. Scale bar, 251m. 
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Extended Data Figure 5 | Altering myocardial Notch signalling affects 
ventricular size and wall thickness but not total number of ventricular 
cardiomyocytes. a-d, The Tg(myl7:Cre) transgenic line used to 
specifically perturb Notch signalling in the myocardium was validated by 
confirming that Cre expression is restricted to the myocardium. Activity 
of myl7:Cre, as visualized by (c) GFP expression from the switch line, 
G-act2:RSG, exclusively overlaps with (b, d) myl7:Cerulean expression 

at 120 hpf (n= 10 embryos). Quantitative analyses of (e) ventricular size 
and (f) wall thickness performed on confocal images from Fig. 2a—h 
reveal that myocardial Notch signalling restricts ventricular size while 
promoting ventricular wall thickness. e, Ventricular size measurements 
were normalized to respective controls for each condition. f, Individual 
measurements (dots) of myocardial thickness were taken across the outer 
curvature of the ventricle (1 = 30 measurements, 6 measurements were 


taken per embryo, 5 embryos per condition). Dashed line represents 

the ventricular wall thickness that distinguishes trabeculated myocardial 
thickness from ventricular outer wall myocardial thickness in control 
hearts. Crosses denote mean and s.e.m. g-p, Quantitative analysis of 

(g) trabecular cardiomyocytes and (p) total ventricular cardiomyocytes 
was calculated by counting myocardial nuclei labelled with myl7: 
H2A-mCherry or anti-Mef2 immunostaining using embryos from 

Fig. 2i-p for g, or from three-dimensional reconstructions in h-o for p. 
In g, the number of trabecular/total ventricular cardiomyocytes was 

used to calculate the percentage of trabecular cardiomyocytes for each 
condition. In p, total ventricular cardiomyocytes were normalized to 
respective controls for each condition. n, Number of embryos analysed per 
condition. Mean + s.e.m. *P < 0.05 by Student's t-test. NS, not significant. 
Scale bar, 25 um. 
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Extended Data Figure 6 | Myocardial Notch activation can inhibit 

the formation and expansion of cardiac trabeculae at various cardiac 
developmental stages. a~g, Tg(myl7:Cre; hsp70I:RSN) and Tg(hsp701:RSN) 
(control) embryos were heat-shocked (HS) during various developmental 
time windows as indicated and imaged at 7 dpf to assess the effects of 
constitutive myocardial Notch signalling on cardiac trabeculae formation. 
a, Red arrows in schematic indicate the time points at which embryos in 
the corresponding panels were heat-shocked. b, Control Tg(hsp701:RSN) 
embryos heat-shocked from 60 hpf to 7 dpf ubiquitously express mCherry 
but do not overexpress myocardial NICD. They form cardiac trabeculae 
(arrowheads) similar to wild-type embryos (control, n = 14/15). 

c, However, Tg(myl7:Cre; hsp701:RSN) embryos heat-shocked from 

60 hpf to 7 dpf overexpress NICD-P2A-Emerald throughout the 
myocardium and fail to form cardiac trabeculae (n = 9/12). Although 


Tg(myl7:Cre; hsp70I:RSN) embryos heat-shocked at (d) 80 hpf, (e) 96 hpf, 
and (f) 120 hpf form trabeculae, these embryos exhibit stunted/smaller 
trabeculae after heat-shocking (n = 9/10, 10/14, and 12/16, respectively). 
g, Graph of trabeculae/total ventricular area of heat-shocked embryos 
from b-f, showing that myocardial Notch over-activation inhibits the 
progression of cardiac trabeculae formation. h, i, Although heat-shocking 
Tg(myl7:Cre; hsp70I:RSN) from 60 to 120 hpf initially inhibits trabeculae 
formation, (i) the ventricular myocardium (detected by anti-MHC/ 
MF20 immunostaining, magenta) can still form trabeculae, albeit at 
reduced numbers (n = 4/5) by 30 dpf after stopping NICD overexpression 
compared with (h) heat-shocked Tg(hsp701:RSN) hearts (control, 
n=0/8). HS, heat-shock; white arrowheads, trabeculae. Scale bar, 251m. 
Mean +s.e.m. *P < 0.05 by Student's t-test. 
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Extended Data Figure 7 | Notch signalling regulates cardiomyocyte 

cell junctions during cardiac trabeculae formation. a—d, In DMSO- 
treated (control) 72 hpf wild-type hearts, N-cadherin is localized at cell 
junctions of cardiomyocytes within the ventricular outer wall (arrows) but 
redistributes away from these cell-cell contacts in cardiomyocytes that 
extend into the lumen to form trabeculae (arrowheads) (n = 12/12). 

e-h, Notch inhibition by DAPT treatment promotes N-cadherin 
redistribution and results in increased trabeculation (n = 8/11). 

m-p, Conversely, myocardial Notch activation by heat shocking (HS) 
Tg(myl7:Cre; hsp701:RSN) leads to diminished N-cadherin redistribution 
and reduced trabeculation (n = 7/10) compared with (i-I) heat-shocked 
Tg(hsp701:RSN) control hearts (n = 0/10). Nascent cardiac trabeculae were 
pseudo-coloured magenta in c, g and k. b, d, f, h, j, 1, n, p, Magnifications 
of boxed areas in a, ¢, e, g, i, k, m, 0, respectively. Arrowheads, N-cadherin 
redistributed from cell-cell contacts; arrows, N-cadherin at cell-cell 
contacts within outer wall. Scale bar, 25 um. 
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Extended Data Figure 8 | Tamoxifen treatment of Tg(myl7:CreER; 
priZm) embryos at 48 hpf labels adjacent individual cardiomyocytes 
with combinations of distinct fluorescent colours. Tg(myl7:CreER; 
priZm) embryos were treated with 4-HT at 48 hpf and confocal imaged 
at 60 hpf before the initiation of cardiomyocytes forming trabeculae. 
Individual cardiomyocytes (arrowheads) are labelled with distinct 
combinations of fluorescent proteins allowing for tracking of specific 
cardiomyocyte clones (n = 6). White arrowheads, cardiomyocytes; 

V, ventricle; A, atrium; white asterisk, AV. Scale bar, 251m. 
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Extended Data Figure 9 | Notch and Erbb2 signalling pathways form 
a feedback loop during cardiac trabeculation. a, b, Compared with 

(a) DMSO-treated Tg(Tp1:d2GFP; myl7:mCherry) (controls) embryos, 
(b) inhibiting Erbb2 function with AG1478 from 60 to 72 hpf blocks 
trabeculation and myocardial Notch signalling (mn = 14/17), confirming 
erbb2 MO and mutant phenotypes. c, However, Notch inhibition using 
DAPT cannot reverse the AG1478/Erbb2 inhibition effect on trabeculae 
formation (n= 11/12). d, e, Consistent with these results, (d) control 
MO- injected Tg(hsp70l:dnM; myl7:mCherry) embryos expressing heat- 
shock induced dnMAML from 60 to 72 hpf display increased trabeculation 
(arrowheads, n = 9/11); (e) however, erbb2 MO-injected embryos 
expressing heat-shock induced dnMAML fail to display trabeculae 

(n= 9/12) as similarly observed in erbb2 MO-injected embryos alone 
(Fig. 3). f-j, The erbb2 fluorescent in situ hybridization and GFP 
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co-immunostaining performed on 72 hpf Tg(Tp1:d2GFP) hearts reveal 
that erbb2 is expressed in an intermittent pattern across the ventricular 
wall and is specifically diminished in Tp1:d2GFP* cells (arrows) (n= 6/6). 
1, p, Heat-shocked (HS) Tg(myl7:Cre; hsp70I:RSN) hearts, which exhibit 
constitutively activated myocardial Notch signalling (NICD) from 60 to 
120 hpf, minimally express erbb2 in the myocardium (n = 8/11) compared 
with (k, 0) heat-shocked Tg(hsp70I:RSN) control hearts (m= 0/20) at 

120 hpf. Compared with (m, q) DMSO-treated control hearts (n= 0/10), 
(n, r) Notch-inhibited hearts by DAPT treatment from 60 to 72 hpf 
exhibit increased myocardial erbb2 expression as well as more trabeculae 
at 72 hpf (n = 8/10), supporting the idea that Notch signalling inhibits 
erbb2 expression. h, i-j, Magnifications of boxed areas in g, h, respectively. 
Arrowheads, trabeculae; arrows, Tp1:d2GFP* cardiomyocytes; white and 
yellow asterisks, AV and OFT. Scale bar, 251m. 
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Extended Data Figure 10 | Transplanted wild-type cardiomyocytes 
non-cell-autonomously activate Notch signalling in erbb2 morphant 
host cardiomyocytes. a, On the basis of mosaic embryo studies from 

Fig. 3f-i, wild-type donor cardiomyocytes contribute equally to the outer 
ventricular wall (14/26 clones) or the trabeculae (12/26 clones) when 
transplanted into control MO host embryos (n = 12 embryos). However, 
when wild-type donor cells are transplanted into erbb2 MO host embryos 
(n= 10 embryos), they contribute more to the trabecular layer (19/23 
clones) than to the ventricular outer wall (4/23 clones, P < 0.05 by Fisher’s 
exact test). b, On the basis of mosaic embryo studies from Fig. 3f-i, 
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transplanting wild-type donor cells increases the number of erbb2 MO 
host cardiomyocytes expressing Tp1:d2GFP (n = 10 embryos) compared 
with non-transplanted erbb2 MO embryos (n= 16 embryos), but had 

no effect on the number of control MO host cells expressing Tp1:d2GFP 
(n= 12 embryos) compared with non-transplanted controls (n= 11 
embryos). c, Quantitative data for Fig. 3f-i reveal that transplanted wild- 
type donor cardiomyocytes are primarily adjacent to host Tp1:d2GFP* 
cardiomyocytes in erbb2 MO hearts (n= 10 embryos). Mean + s.e.m. 

*P < 0.05 by Student's ¢-test. NS, not significant. 
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Widespread transmission of independent cancer 
lineages within multiple bivalve species 


Michael J. Metzger’, Antonio Villalba**, Maria J. Carballal’, David Iglesias’, James Sherry°, Carol Reinisch’, 


Annette F. Muttray®’, Susan A. Baldwin® & Stephen P. Goff!?® 


Most cancers arise from oncogenic changes in the genomes of 
somatic cells, and while the cells may migrate by metastasis, they 
remain within that single individual. Natural transmission of 
cancer cells from one individual to another has been observed in 
two distinct cases in mammals (Tasmanian devils! and dogs”*), but 
these are generally considered to be rare exceptions in nature. The 
discovery of transmissible cancer in soft-shell dams (Mya arenaria)* 
suggested that this phenomenon might be more widespread. Here 
we analyse disseminated neoplasia in mussels (Mytilus trossulus), 
cockles (Cerastoderma edule), and golden carpet shell clams 
(Polititapes aureus) and find that neoplasias in all three species 
are attributable to independent transmissible cancer lineages. In 
mussels and cockles, the cancer lineages are derived from their 
respective host species; however, unexpectedly, cancer cells in 
P. aureusare all derived from Venerupis corrugata, a different species 
living in the same geographical area. No cases of disseminated 
neoplasia have thus far been found in V. corrugata from the same 
region. These findings show that transmission of cancer cells in 
the marine environment is common in multiple species, that it has 
originated many times, and that while most transmissible cancers 
are found spreading within the species of origin, cross-species 
transmission of cancer cells can occur. 

Disseminated neoplasia, or haemic neoplasia, a leukaemia-like dis- 
ease, occurs with high prevalence in multiple bivalve species*®. Here 
we investigate the possibility that cancers in three species could be 
attributed to transmissible cancer cells, and whether these cancers 
are restricted to the species of origin or can undergo cross-species 
transmission. 

Mussels (M. trossulus; Fig. 1a) are subject to disseminated neoplasia 
in the Pacific Northwest Coast”®, and evidence of common polymor- 
phisms in neoplasias suggested that these might represent a transmis- 
sible cancer’. Twenty-eight mussels (M. trossulus) collected from West 
Vancouver were screened for neoplasia by drawing haemolymph and 
analysing haemocytes for the rounded, non-adherent morphology 
of neoplastic cells. Two were identified with high levels of neoplastic 
cells. We sequenced part of the mitochondrial cytochrome c oxidase I 
(mtCOD) gene in host tissue and neoplastic haemocytes from the two 
diseased animals and four normal animals to test whether the neoplas- 
tic cells were derived from the host individuals or exhibited a distinct 
genotype, the hallmark of transmissible neoplasia. While solid tissue 
and haemocyte genotypes within each normal animal were always iden- 
tical, solid tissue and neoplastic haemocyte genotypes of the diseased 
animals were discordant (Fig. 1b and Extended Data Fig. 1). Moreover, 
the same single nucleotide polymorphisms (SNPs) were found in neo- 
plastic cells of the two diseased individuals, indicating that the cancer 
cells were not of host origin and suggesting that they arose froma single 
clonal origin. EF1a gene sequences also revealed that the genotypes of 
the host cells and neoplastic haemocytes were discordant, and that the 


genotypes of the neoplastic cells of the two different animals were again 
identical to each other (Fig. Ic). 

To determine whether this transmissible cell line was widespread in 
the M. trossulus population, 250 were collected from Vancouver Island, 
and seven potentially diseased individuals were analysed. In one neo- 
plastic sample (MW81), the haemocyte and tissue genotypes did not 
match, and the haemocytes contained the same mtCOJ allele and the 
same EF1a major and minor alleles found in the other samples (Fig. 
1b, cand Extended Data Fig. 1). These genotypes strongly indicate the 
existence of a M. trossulus-derived transmissible cancer lineage circu- 
lating in the wild population. 

High prevalence of disseminated neoplasia has been observed in 
two species of bivalves on the Galician Coast: cockles (C. edule)'®"! 
and golden carpet shell clams (P. aureus, previously named Venerupis 
aurea)'*. The disease in cockles (C. edule) exhibits one of two distinct 
morphologies, termed types A and B'*!4. We collected about 150 
cockles (C. edule; Fig. 2a), and examined the genotypes of solid tissue 
and haemocytes of six normal individuals and six with high (>75%) 
or moderate (15-75%) amounts of neoplastic cells in the haemolymph. 
Nine polymorphic microsatellite loci!® were amplified from 
normal animals, and allele sizes from tissue and haemocytes of each 
normal animal all matched, but allele sizes in haemocytes and tissue 
of diseased animals were discordant (Fig. 2b, c). In a phylogenetic tree 
based on microsatellite alleles!” the neoplastic haemocyte genotypes 
did not group with the host tissue genotypes, consistent with trans- 
missible cancer, and instead clustered into two distinct branches, 
suggesting two independent cancer lineages (Fig. 2d). We sequenced 
the mtCOI gene and identified several SNPs that were present only in 
lineage 2, and not in any of the normal animals. No unique SNPs were 
identified in the mtCOI region sequenced in lineage 1. 

The microsatellite alleles and mtCOI SNPs suggest two independ- 
ent cancer lineages, but these data are also consistent with two sub- 
groups that have diverged from a single transmissible cancer lineage. 
To investigate this question, we sequenced an approximately 3-kilobase 
intron-spanning region in EF1a from six normal individuals and two 
diseased individuals from each lineage (Fig. 2e). Both neoplastic 
haemocyte alleles were different from the tissue alleles in all diseased 
individuals, and the two alleles of the neoplasm in each diseased indi- 
vidual were nearly identical to the two alleles of the other individual 
in the lineage. However, both alleles in lineage 1 cells were different 
from those in lineage 2. Moreover, the neoplastic alleles were more 
closely related to some normal alleles than they were to the alleles of 
the alternative lineage. These data strongly suggest an independent 
origin of these two lineages. 

Histological and morphometric examination showed that all three 
samples in lineage 1 were type A (characterized by a looser arrangement 
of neoplastic cells in the connective tissue, with pleomorphic nuclei) 
and all three samples in lineage 2 were type B (tighter arrangement, 
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Figure 1 | Analysis of tissue and haemocyte genotypes of normal 
mussels (M. trossulus) and mussels with disseminated neoplasia using 
mitochondrial and nuclear DNA markers. a, Representative M. trossulus, 
with ruler for scale. b, Sequencing of mtCOI (bases 99-759) was conducted 
for both host solid tissue (T) and haemocytes (H) of normal (n = 4) and 
neoplastic mussels (n = 3). Open boxes mark SNPs that differ from allele 
A. Filled boxes mark discordance between neoplastic haemocyte and 
tissue genotypes. Sequences are numbered using AY823625.1. c, Maximum 
likelihood phylogenetic tree based on sequences of an intron-containing 


with smaller cells than type A and rounded, smaller nuclei; Fig. 2f-i 
and Extended Data Table 1). Altogether, these results argue that two 
distinct lineages of transmissible cancer, with distinct morphologies 
and genotypes, arose independently in cockles and are circulating in 
this species. 

We also examined golden carpet shell clams (P. aureus; Fig. 3a), 
which are present in the same habitat as several other bivalves, includ- 
ing the closely related pullet shell clam (V. corrugata; Fig. 3b). Of 74 
P. aureus individuals tested, 9 had high levels of disease and 22 had low- 
to-medium disease. We sequenced regions of mtCOI, the ribosomal 
DNA internal transcribed spacer (rDNA ITS), and EF 1a from tissue 
and haemocyte DNA of six highly diseased and six normal P. aureus 
individuals, and again found that the genotypes of neoplastic cells were 
nearly identical and did not match those of their hosts (Fig. 3c—e). In 
contrast to the transmissible cancers in other species, however, the 
sequences of the neoplastic cells were highly dissimilar from normal 
host sequence (only 78.4—78.5% identical in the mtCOI locus and 
89.3-93.2% identical at nuclear loci, ignoring insertions and dele- 
tions). The neoplastic genotypes were instead near-perfect matches 
to the sequences of V. corrugata (98.6-7% and 99.3-100% identical, 
respectively). 

To confirm that detection of the V. corrugata sequence reflected the 
neoplastic cells observed morphologically, we used species-specific 
EF 1a quantitative PCR (qPCR) to quantify the fraction of cells that 
were derived from each species in the tissue and haemocyte samples 
(Fig. 3f). As expected, no V. corrugata sequences were detected in nor- 
mal P aureus animals, but high frequencies of V. corrugata DNA were 
detected in all six individuals diagnosed with high levels of neoplasia 
and lower amounts in most individuals diagnosed with medium and 
low levels. The small number of individuals diagnosed with low and 
medium disease in which V. corrugata DNA was not detected could 
have primary host-derived neoplasia, a different transmissible cancer 
lineage, too low an abundance of tumour DNA to detect, or even a 
small amount of normal cells with an unusually rounded morphol- 
ogy. In diseased animals, the highest levels of V. corrugata sequence 
were detected in haemocytes, and lower levels were present in tissues, 
probably because of infiltration of cancer cells via the circulation. We 
conclude that the cancer spreading in the P aureus population resulted 
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0.006 
region of EF 1a, with bootstrap values over 50 shown and a scale bar 
showing genetic distance. M. edulis (614 base pairs (bp), EU684203.1) 
was used as an outgroup. In normal animals, the sequences of tissue 
and haemocyte DNA were identical and are presented together (T&H; 
black circles). For neoplastic animals, the tissue (T; open red circles) and 
haemocyte (H; filled red circles) alleles differed. Letters ‘a and ‘b’ denote 
multiple alleles in heterozygous individuals, and major and minor alleles 
from neoplastic alleles are marked. See Extended Data Fig. 1 for further 
details. 


from cross-species transmission of cancer cells of V. corrugata origin. 
It is noteworthy that no cases of disseminated neoplasia have thus far 
been found in V. corrugata from the same region, despite analysis of 
hundreds of clams, suggesting that the V. corrugata cancer cells do not 
initiate disease in the species of origin and were only observed to col- 
onize P. aureus animals. 

We previously identified an LTR-retrotransposon, Steamer, which 
was amplified in the transmissible neoplasia lineage in the soft-shell 
clam (M. arenaria)'*. Using degenerate primers, we identified at least 
one Steamer-like element (SLE) in each of the three species studied 
here. In each case, copy numbers were variable among individuals, 
but no massive amplification of these particular retrotransposons was 
observed in any of the transmissible cancers assayed (Extended Data 
Fig. 2). This suggests that Steamer-like retrotransposon amplification 
is not essential in development of transmissible clones. 

Our results indicate that transmission of contagious cancer cells is 
a widespread phenomenon in the marine environment, with multiple 
independent lineages developing in multiple species in four bivalve 
families. Along with the recent identification of a second independent 
lineage of transmissible cancer in Tasmanian devils’, these findings 
confirm that, under suitable conditions, the development of transmis- 
sible cancer can occur multiple times, and suggests that some species 
may be more susceptible to development of transmissible neoplasia 
than others. 

Spontaneous haemic neoplasia must occur at some frequency for 
transmissible lineages to arise, but cases of transmissible cancer appear 
to outnumber spontaneous disease, at least in the species investigated 
so far (that is, all neoplastic cases tested in cockles, golden carpet shell 
clams, and three of nine samples in mussels, in addition to all samples 
in the previous report of soft-shell clams‘). 

Transmissible cancers appear largely restricted to the species of ori- 
gin in nature. While canine transmissible venereal tumour has been 
experimentally transplanted to coyotes”, jackals”!, and foxes”?-*4, no 
examples of natural transmission beyond dogs have been reported”». 
Disseminated neoplasias of molluscs have only been transplanted to 
members of the same species (soft-shell clams, mussels, and others>®). 
Attempts to transfer M. arenaria neoplasia through water expo- 
sure to both M. arenaria and M. trossulus only resulted in transfer 
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Figure 2 | Analysis of lineages of transmissible neoplasia in cockles 

(C. edule). a, Representative C. edule (scale bar, 20mm). 

b, c, Microsatellite loci were amplified from solid tissue (T) and haemocyte 
(H) DNA from normal (samples N1-N6) and diseased (M1-M3 and H1- 
H3) cockles. Products of a representative locus (CeATC1-5) are shown. 
Triangles mark samples corresponding to lineages 1 (red) and 2 (purple). 
d, Neighbour-joining phylogenetic tree based on nine microsatellite loci 
(see Source data). Bootstrap values over 50 are shown and scale bar is 


of disease to M. arenaria”®, and injection of M. trossulus cancer 
cells into multiple bivalve species only resulted in engraftment in 
M. trossulus?’. Our finding that multiple cancer lineages are most often 
found to spread within the original host species is consistent with 
these previous experiments, and suggests that there may be species- 
specific restriction factors that prevent engraftment into divergent 
hosts. 

In this context, our observation of cross-species transmission, a 
cancer from one species spreading through another, is particularly striking. 
They both belong to the Veneridae family (some studies suggest that 
P. aureus should belong to the Venerupis genus’), and coexist in the 
same beds. This close relationship may have aided the transmission. 


based on comparison across all 104 observed alleles. mtCOI SNPs unique 
to lineage 2 are marked. e, Maximum likelihood tree of EF 1a (2725-4249 
bp, spanning four introns), rooted at the midpoint, with bootstrap 

values above 50 and scale bar showing genetic distance. Letters ‘a and 

‘b’ denote multiple alleles from heterozygous samples, and alleles from 
neoplastic lineages 1 and 2 are marked. f, g, Histology of cockles H3 and 
H2, representative of type A neoplastic cells. h, i, Histology of cockle H1, 
representative of type B neoplastic cells. 


It is notable that despite ongoing surveillance of the bivalves of the 
Galician coast since the 1990s, only one V. corrugata has been iden- 
tified as harbouring disseminated neoplasia (S. Darriba, personal 
communication), while the prevalence of disease in P aureus is quite 
high!? (12% of the individuals collected for the current study had high 
levels of neoplasia and 42% had some detectable level of disease). This 
would be explained if the cancer originated in V. corrugata, led to 
selective loss of susceptible animals, and left the current population of 
V. corrugata resistant to engraftment and disease. The spread of trans- 
missible cancers and the evolutionary pressure to defend against them 
may be a strong and underappreciated selective force in the evolution 
of multicellular organisms. 
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Figure 3 | Phylogenetic analysis of neoplastic cells in P. aureus 
and quantification of V. corrugata cell engraftment in normal and 
diseased animals. a, Representative P aureus clams (scale bar, 20 mm). 
b, Representative V. corrugata clams (scale bar, 20 mm). c-e, Maximum 
likelihood trees of sequences from solid tissue (T) and haemocyte (H) 
DNA from normal (N1-N6) and highly diseased (H1-H6) P. aureus 
and normal V. corrugata, based on (c) mtCOI (658 bp), (d) rDNA ITS 
(454-373 bp), and (e) EF la (148 bp, letters ‘a and ‘b’ denote multiple 


In summary, our findings show that transmission of cancers between 
individuals within a species may be more common than previously 
assumed, and that transmission can even occur between species. We 
now know of eight transmissible cancers in nature: one lineage in dogs, 
two lineages in Tasmanian devils, and five lineages circulating in four 
species of molluscs, including one example of cross-species trans- 
mission. Further studies may well reveal additional examples. The 
recent report of a cancer composed of transformed tapeworm cells 
in an immunocompromised AIDS patient” highlights the possibility 
that transmissible tumours could arise in humans. These transmissi- 
ble cancers constitute a distinct class of infectious agent and show the 
remarkable ability of tumours to acquire new phenotypes that promote 
their own survival and propagation. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to diagnosis during 
sequencing and morphology assessment. 

Collection and diagnosis of M. trossulus. Mussel (M. trossulus) specimens 5-6 cm 
long were collected from the intertidal zone at low tide (noon) on 18 April 2015 at 
Copper Beach (49° 22/41’ N, 123° 16’ 44’ W, West Vancouver, British Columbia, 
Canada). They were transported to the laboratory in aerated seawater from Copper 
Beach. In the laboratory, 0.5-1.0 ml haemolymph was removed from the posterior 
adductor muscle. For each individual, one drop of haemolymph was placed on a 
poly-t-lysine-coated slide and let sit for 10-15 min to allow the cells to spread and 
attach to the slide. Thereafter, the slide was viewed under a Zeiss Axiostar light 
microscope at x40 magnification. Normal (non-neoplastic) specimens were those 
with greater than 90% cells with normal appearance: that is, agranular or granular 
haemocytes with spread pseudopodia™. Fully leukaemic (diseased) specimens were 
those with prolific amounts (>90%) of round, non-adherent cells. Haemolymph 
was added to 1.5 ml Eppendorf tubes and spun at 900g for 3 min. After the super- 
natant was withdrawn, the cells were re-suspended in absolute ethanol before ship- 
ping. Excised tissues (mantle, foot, and gills) were preserved in absolute ethanol 
before shipping. Of 28 individuals collected from West Vancouver, two had high 
levels of neoplastic disease. 

A second set of M. trossulus samples were collected from Esquimalt, Vancouver 

Island, British Columbia, Canada. Haemolymph was extracted and cell morphol- 
ogy was used for diagnosis as above. Haemocyte and solid tissue samples were fixed 
in ethanol before DNA extraction. Of 250 individuals collected from Esquimalt, 
9 were scored as potentially moderately or highly diseased, with 7 samples available 
for analysis. 
Collection and diagnosis of C. edule and P. aureus. Cockles (C. edule) were 
collected from an intertidal bed named O Sarrido (42° 30’ N, 8° 49’ W) and golden 
carpet shell clams (P. aureus) and pullet shell clams (V. corrugata) were collected 
from a subtidal bed named O Bohido (42° 32’ N, 8° 51’ W); both shellfish beds are 
located in the ria of Arousa (Galicia, northwest Spain). Once in the laboratory of 
Centro de Investigaciéns Marifias, each cockle and clam was notched through the 
shell margin close to the posterior adductor muscle and haemolymph (as much as 
possible) was collected from the posterior adductor muscle using a 2-ml syringe 
with a 21-gauge needle. A small quantity (about 1001) of haemolymph was used 
to produce a cell monolayer onto a slide by cytocentrifugation (92g, 5 min, 4°C), 
which was fixed, stained with a Hemacolor (Merck) kit, and examined with light 
microscopy for diagnosis of disseminated neoplasia; the remaining haemolymph 
was preserved in absolute ethanol for molecular analysis. After haemolymph col- 
lection, molluscs were shucked and a small piece of mantle (about 5mm x 5mm) 
was removed and preserved in absolute ethanol for molecular analysis; additionally, 
an approximately 5-mm-thick section of meat, containing gills, visceral mass, foot, 
and mantle lobes, was fixed in Davidson's solution and embedded in paraffin. 
Sections of 541m thickness were stained with Harris’s haematoxylin and eosin*?. 
Histological sections were examined under light microscopy for histopathological 
analysis. 

Morphometric analysis was conducted on histological sections of cockle 
(C. edule) samples. Neoplastic cells (both A and B types) had a unique morphology, 
and were clearly distinguished from normal cells, with much larger overall size, and 
much larger nucleus. The longest diameters of the cell and the nucleus of at least 
ten neoplastic cells for each individual were measured by direct examination of 
histological sections with light microscopy using a reticle. Two-tailed t-tests were 
used for comparisons of morphometric data from different individuals and types. 

Cockles (C. edule) and golden carpet shell clams (P aureus) were ranked accord- 
ing to a scale of disease severity based on haemolymph diagnosis: non-affected (NO, 
or N); low severity (N1, or L), when individuals showed proportions of neoplastic 
cells lower than 15% in the haemolymph cell monolayers; moderate severity (N2, 
or M), when the proportion ranged from 15% to 75%; and high severity (N3, 
or H), when the proportion was higher than 75% (ref. 32). Seventy-four golden 
carpet shell clams (P. aureus) were collected with 12 diagnosed with light, 10 with 
moderate, and 9 with heavy neoplasia. About 150 cockles (C. edule) were collected 
and a subset was analysed both for disease and for morphological type of neoplastic 
cells. Of the 30 in this subset, two had type A neoplastic cells (one light and the 
other moderate) and one had low levels of type B neoplastic cells. We have also 
collected and analysed hundreds of pullet clams (V. corrugata) from multiple beds 
throughout Galicia from 1988 to the present, including about 100 from the same 
bed in which the samples of P. aureus were collected for this study (O Bohido). So 
far, we have not observed any V. corrugata samples with neoplastic disease. One 
V. corrugata individual from a different location has been identified by a different 
researcher as harbouring disseminated neoplasia, with a different morphology 
than that observed in P. aureus neoplasia (S. Darriba, personal communication). 


DNA extraction. DNA was extracted from ethanol-fixed haemocytes using 
DNeasy Blood and Tissue Kit (Qiagen). DNA extraction of tissues used the same 
kit, but included an additional step to reduce the amount of PCR-inhibiting 
polysaccharides. After tissue lysis, 63 11 of buffer P3 was added to lysate and allowed 
to precipitate for 5 min. Lysate was spun 10 min at full speed at 4°C, and the resulting 
supernatant was mixed with buffer AL for 10 min at 37 °C, then mixed with ethanol 
and added to the column, continuing with the standard protocol. 

mtCOI, EF 1a, ITS, Steamer-like PCR. Primers and annealing temperatures 
used are listed in Extended Data Table 2. PfuUltra II Fusion HS DNA Polymerase 
(Agilent) was used to amplify 10 ng of genomic DNA for 35 cycles. Extension was 
at 72°C for 15s for mtCOI for all bivalves*?. For P aureus, ITS was amplified (using 
primers modified from refs 34, 35) with extension for 30s. Despite multiple copies 
of ITS in genomic DNA, a single sequence was observed for each normal P. aureus 
and V. corrugata, and a single pair of host/neoplasm sequences was observed in 
each diseased individual. PCR for EFla followed the same program, with extension 
for 20s in mussels (M. trossulus) and clams (P. aureus and V. corrugata) and 1 min 
30s in cockles (C. edule). For amplification of Steamer-like elements, a degenerate 
primer pair was designed to match conserved regions in reverse transcriptase and 
integrase of the LTR-retrotransposon, Steamer, and 10ng of DNA was amplified 
for 35 cycles, annealing at 55°C for 20s and extending for 1 min at 72°C. PCR 
products were directly sequenced. When multiple alleles could not be resolved 
by direct sequencing, PCR products were cloned using the Zero Blunt TOPO Kit 
(Invitrogen), and at least six colonies were sequenced. Sequences were aligned with 
Clustal W, with some manual adjustment. Primer-binding regions were excluded 
from analysis and are not included in the sizes listed as sequenced. Maximum 
likelihood phylogenetic trees were generated using PhyML 3.0 (ref. 36), using the 
HKY85 substitution model, with 100 bootstrap replicates, treating gaps in the 
alignment as missing data. Trees were visualized using FigTree version 1.4, with 
addition of markers at the branch termini. Each phylogenetic tree based on align- 
ment of sequence at a single locus (Figs 1c, 2e and 3b-d) includes a scale bar that 
shows genetic distance (0-1) based on the frequency of nucleotide divergence. All 
sequences are available in GenBank (accession numbers KX018521-KX018605). 
Analysis of microsatellite loci from C. edule. Microsatellites were amplified using 
primers for 12 loci, reported previously to be polymorphic in cockle (C. edule) 
populations!>. Of the 12 primer pairs, 9 pairs amplified products in all cockle DNA 
samples, with 1-2 alleles observed in normal samples and 1-4 neoplastic alleles 
observed. These were used in all further analyses, with fluorescent modifications 
on the 5’ end of the forward primers as listed (Extended Data Table 2). KOD pol- 
ymerase (Millipore) was used to prevent ambiguity of untemplated addition of A 
residues by Taq polymerase. Products were run on an agarose gel for visualization. 
Allele sizes were identified to single-base precision by fragment analysis using 
florescent primers (6-FAM, PET, NED, and VIC) using a 3730xl Genetic Analyzer 
with the LIZ-500 size standard (Applied Biosystems, operated by Genewiz). Peaks 
were called using Peak Scanner 2.0 (Applied Biosystems), rounding sizes to whole 
bases, with some manual adjustment to keep alleles of the same size together. 

We used the R package Poppr'® to generate distance matrices and phylogenetic 
trees using Bruvos method with the infinite alleles model. The value of c (repeat 
size) was set to 0.00001 to calculate distances using the band sharing model, in 
which alleles are either identical sizes (distance = 0) or non-identical (distance = 1). 
Each allele was coded as a dominant marker!’, with a single variable for each allele 
observed at each position (104 total markers from 9 loci). For each allele (a single 
size at a single locus), a sample is observed to be present, absent, or the information 
missing. For example, a sample with sizes 195 and 199 at locus A would be marked 
as ‘present’ for marker ‘A-195’ and ‘A-199? but ‘absent’ for all other sizes at that 
locus (like ‘A-203’ and ‘A-207’). This analysis method ignores the uncertainty in 
the copy number of each locus in aneuploid cancer cells and allows for ambiguity 
at particular alleles. In cases where the haemocyte and tissue genotypes could be 
clearly differentiated, all alleles were analysed (H1 and H3). In some cases (M1, M2, 
M3, and H2), cancer-specific alleles could be detected in the haemocyte samples, 
but represented less than 50% of the total haemocyte DNA, and the host tissue 
alleles obscured one or two positions (depending on whether the normal genotype 
was homozygous or heterozygous at that locus). In these cases, since the tissue 
allele obscured a potential neoplastic allele, that size was coded as ‘missing’ for the 
haemocyte genotype. The total distance between any two samples is calculated 
as the average of the pairwise differences at all allele sizes observed (104 pairwise 
comparisons if no data are missing). In pairwise comparisons for each allele, two 
samples either have a distance of 0 (meaning that the particular allele size is either 
present in both samples or absent in both samples) or a distance of 1 (meaning 
that the particular allele size is present in one sample but absent in the other). If 
data are missing for one or both, then that specific comparison is dropped with no 
contribution to the total distance between the two samples. The genetic distances 
calculated are based on the alleles observed at the nine loci, so the absolute value 
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of the scale itself is therefore dependent on the number of observed alleles that are 
included in the analysis. Source data for the generation of the phylogenetic tree lists 
all observed alleles coded as present (1), absent (2), or missing (0) for each sample. 
As an alternative analysis method, the data were analysed using Bruvo’s method 
as polyploid data with nine loci, using only the individuals that could be confi- 
dently identified. In this alternative analysis, the topology of the resulting tree was 
not different—all nodes with bootstrap values above 50 were maintained. In this 
analysis, the genetic distance between the representatives of lineage 1 (H3) and 
lineage 2 (H1) was 0.676, and the distances to the closest normal sample were 0.595 
(H3 to N5) and 0.648 (H1 to N1 and N2). 
qPCR. Species-specific qPCR of the EF1a locus in P aureus and V. corrugata was 
done using FastStart Universal SYBR Green Master Mix (Roche). Primers were 
designed to amplify the same region, with the 3’ end of both forward and reverse 
primers on sites that differ between the two species (Extended Data Table 2). 
Standard curves for each primer set were generated using two control plasmids 
(one containing the P. aureus EF1a fragment and P. aureus SLE fragment, the other 
containing the fragments from V. corrugata). EFla fragments were amplified using 
conserved primers (Extended Data Table 2). Fragments were cloned with a Zero 
Blunt TOPO Kit (Invitrogen), and plasmids were linearized with NotI before qPCR. 
Standard curve samples (10*-10* copies per reaction) and experimental samples 
(2.5 ng per reaction) were done in triplicate. No amplification was detected with the 
P. aureus primers on the V. corrugata plasmid or with the V. corrugata primers on 
P. aureus plasmid (using up to 107 copies per reaction). The fraction of V. corrugata 
in each sample was calculated as the copy number using V. corrugata-specific prim- 
ers divided by copy number of P. aureus- plus V. corrugata-specific amplification. 
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Quantification of Steamer-like elements in the three species was performed by 
the same method, using primers in the RT-IN region of the retrotransposon (on 
the basis of sequence obtained through amplification using degenerate primers) 
and control primers amplifying a region in EF 1a. All primers used to amplify SLEs 
and EF la from genomic DNA for cloning into standards and primers for qPCR 
are listed in Extended Data Table 2. For each species, a single control plasmid 
containing both the EF1a and SLE fragments was used. 


30. Muttray, A. F., Schulte, P. M. & Baldwin, S. A. Invertebrate p53-like mRNA 
isoforms are differentially expressed in mussel haemic neoplasia. Mar. Environ. 
Res. 66, 412-421 (2008). 

31. Howard, D., Lewis, E., Keller, B. & Smith, C. NOAA Technical Memorandum NOS 
NCCOS Vol. 5 (NOAA/National Centers for Coastal Ocean Science, 2004). 

32. Diaz, S., Cao, A., Villalba, A. & Carballal, M. J. Expression of mutant protein p53 
and Hsp70 and Hsp90 chaperones in cockles Cerastoderma edule affected by 
neoplasia. Dis. Aquat. Organ. 90, 215-222 (2010). 

33. Folmer, O., Black, M., Hoeh, W., Lutz, R. & Vrijenhoek, R. DNA primers for 
amplification of mitochondrial cytochrome c oxidase subunit | from diverse 
metazoan invertebrates. Mol. Mar. Biol. Biotechnol. 3, 294-299 (1994). 

34. Oliverio, M. & Mariottini, P. Contrasting morphological and molecular variation 
in Coralliophila meyendorffii (Muricidae, Coralliophilinae). J. Molluscan Stud. 67, 
243-245 (2001). 

35. Salvi, D. & Mariottini, P. Molecular phylogenetics in 2D: ITS2 rRNA evolution 
and sequence-structure barcode from Veneridae to Bivalvia. Mol. Phylogenet. 
Evol. 65, 792-798 (2012). 

36. Guindon, S. et al. New algorithms and methods to estimate maximum- 
likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 
307-321 (2010). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 
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Extended Data Figure 1 | Analysis of mtCOI amplified from tissue (haemocytes). a, b, In normal mussels, tissue and haemocyte alleles match 
and haemocyte DNA of normal and diseased mussels (M. trossulus). (with G496 at all positions). c—e, In mussels with disseminated neoplasia, 
A partial region of the mtCOI gene was amplified from genomic DNA the tissue and haemocyte alleles are different. Neoplastic haemocytes 
of solid tissue and haemocytes from mussels (M. trossulus) and directly have A at position 496 and G in tissue, with some A observable in tissue, 
sequenced (Fig. 1b). Trace images show a region flanking a representative probably because of infiltration of neoplastic haemocytes. 


SNP marked with an open triangle (tissue) or closed triangle 
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a SLE-Mt in mussels (M. trossulus) b SLE-Ce in cockles (C. edule) 
34 ai 
64 
2.5.4 
5 
2 | 
44 
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1 
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M. trossulus M. trossulus C. edule C. edule neoplastic 
C. edule 
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30 
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15 
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Extended Data Figure 2 | Quantification of Steamer-like element (M. trossulus) and (b) cockles (C. edule). c, d, In golden carpet shell clams, 
genomic copy number in mussels (M. trossulus), cockles (C. edule), one SLE (SLE-Pa) was cloned from a normal P. aureus (clam N2) anda 
and golden carpet shell clams (P. aureus). a—d, Fragments from the different one (SLE-Vc) was cloned from neoplastic cells (clam H2). Both 
SLE reverse transcriptase region and EF 1a genes were cloned from each SLEs could be found in both species, and qPCR analysis confirmed that 
species. Haploid copy numbers of Steamer-like elements (SLE) were SLE-Pa is more highly amplified in P. aureus and has fewer copies in 
quantified by determining the ratio of SLE/EFla in genomic DNA from V. corrugata and in the neoplastic cells derived from V. corrugata. 


haemocytes. Single species-specifc SLEs were analysed in (a) mussels 
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Extended Data Table 1 | Morphometric analysis of type A and type B cockle (C. edule) neoplasia 


Cells Cell diameter Nuclear diameter = Nuclear/cell ratio 

Cockle ID Type counted(N) +SD (um) +SD (um)t +sp+* 

M1 A 15 9.0 + 1.2 6.7 + 1.0 0.75 + 0.09 
H2 A 15 8.7 +09 6.6 + 08 0.76 + 0.06 
H3 A 10 8.5 + 1.1 6.5 + 1.0 0.77 + 0.07 
total A 40 8.8 + 1.0 6.6 + 0.9 0.76 + 0.07 
M2 B 10 7.7 +07 5.5 + 0.5 0.72 + 0.07 
M3 B 15 6.7 412 45 +06 0.70 + 0.15 
H1 B 15 7.3 + 1.0 5.4 + 08 0.75 + 0.09 
total B 40 f2) 4 14 5.1 + 08 0.72 + 0.11 


*Cell diameter is statistically different between types A and B (P< 0.0001). All tests are two-tailed t-tests. 
tNuclear diameter is statistically different between types A and B (P<0.0001); for all pairwise comparisons between type A and type B individuals, P< 0.05. 
+Ratio of nuclear diameter to cell diameter is not significantly different between types. 
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Extended Data Table 2 | Primers used in PCR and qPCR 


Target 
mtCol 


ITS 


EF1la 


Mussel (M. trossulus) 


Cockle (C. edule) 


P. aureus & V. corrugata 


Forward Primer 


LCO1490 


its-3d 


consEF1-F1 
CeEF1-F2 
consVEF 1F3b 


Cockle (C. edule) Microsatellite Loci 


Cloning qPCR Standards 


EF1a 


Mussel (M. trossulus) 


Cockle (C. edule) 


P. aureus & V. corrugata 


SLE 


Mussel (M. trossulus) 


Cockle (C. edule) 


P. aureus & V. corrugata 


qPCR 
EF1a 


Mussel (M. trossulus) 


Cockle (C. edule) 
P. aureus 
V. corrugata 


SLE 

SLE-Mt 
SLE-Ce 
SLE-Pa 
SLE-Ve 


CeATC1-5F-6FAM 
CeATC1-22F-VIC 
CeATC1-36F-NED 
CeATC1-52F-PET 
CeATC1-54F-6FAM 
CeATC2-4F-VIC 
CeATC2-11F-NED 
CeATC2-34F-6FAM 
CeATC2-46F-NED 


consEF1F1 
CeEF1-F1 
consVEF1F3b 


DHKPL-F1 
LVW-F 1 
DHKPL-F1 


Forward Primer 


MEF 1qF2 
CeEF1qF2 
VaEF1qF1 
VcEF1qF1 


MtSLEqF2 
cockleRT-F2 
VaNSLEqF 1 
VaHSLEqF1 


GGTCAACAAATCATAAAGATATTGG 


gcgtcgatgaagagcgca 


ACCATTGATATTGCTYTNTGGAA 
TGGTATCACCATCGATATTGC 
AGGAACCTCTCAAGCTGAYTG 


egttctaccggcatatgtcac 
caaacctgaccgggtttatt 
gacatgacaaacaggcctca 
aatctgattttgccacctct 
tacaaggccgagaaactgct 
tggaaatgcattcattgagc 
tggtgtgcaattagatgcttg 
gccatagaggccaccctatt 
accaaggcagatatcgatcc 


ACCATTGATATTGCTYTNTGGAA 
TGGAACAAACTGAAGGCCGA 
AGGAACCTCTCAAGCTGAYTG 


TTGAAAGCGACCACAARCCNYT 
GGTATCCAGCCTAGTNGTNGT 
TTGAAAGCGACCACAARCCNYT 


TAGGTATTGGAACAGTGCCAGT 
CCAGTGGGCCGAGTTGAGAC 

AAAGAATGGACAGACCAGAGAG 
CAAGAATGGACAGACAAGAGAA 


TGGACTACTTTACAAAAGCCAACG 
GCAGTGTGCAATGTCCCAGT 
TAGAGACGAATTGTCCGTGGG 
TAGAGATGAACTCTCTTTTGA 


Reverse Primer 


HCO2198 


its-4r 


MtEF 1-R2 
CeEF1-R3 
consVEF 1R3b 


CeATC1-5R 

CeATC1-22R 
CeATC1-36R 
CeATC1-52R 
CeATC1-54R 
CeATC2-4R 

CeATC2-11R 
CeATC2-34R 
CeATC2-46R 


consEF1R2 
CeEF1-R3 
consVEF1R3b 


PXRPW-R1 
PXRPW-R1 
PXRPW-VaR1 


Reverse Primer 


MEF 1qR1 
CeEF1qR2 
VaEF1qR1 
VcEF1qR1 


MtSLEqR1 
cockleRT-R2 
VaNSLEqR4 
VaHSLEqR1 
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Temp 


TAAACTTCAGGGTGACCAAAAAATCA 50°C 


agttttttttcctccgctta 55°C 
ACAATCAAAATGGCACAATC 50°C 
CCGTTTCGGATCTCTACAGG 55°C. 
AGCAATTACCTCGCTGTANGG 55°C 
caccttccaccactagaagaaaa 60°C 
tgactccactttttcagttcca 60°C 
tggectggtcttatttccac 60°C 
agctcataggagttgtatacgtaag 55°C 
caatgactgccaaatgagga 55°C 
ecgattgcgttctttgatct 55°C 
taggctcgcagaaagatggt 60°C 
gggctgacaagatttgacatt 60°C 
tccagttttaaacgcactctga 55°C 
ACGTTGAAACCAACRTTRTC 50°C 
CCGTTTCGGATCTCTACAGG 50°C 
AGCAATTACCTCGCTGTANGG 55°C 
TTCCGACTTTGGCCCANGGNC 55°C 
TTCCGACTTTGGCCCANGGNC 55°C. 
CIGCAATTTTCTGCCANGGNC 55°C 


AGACTCGTGGTGCATTTCTAC 
CTTCAGTGGTCACGTTGGCAG 
TGCTTCACACCCAATGTGAAA 
TGTTTCACACCCAAGGTGAAG 


GCTTCGTGCAATTTCACTAACAT 
GACCTGTGTCCGAGGCATCA 
CGTGCTCGCTGTAAGCATTT 
TGCACGGCTTAATGTTTTGGA 


© 2016 Macmillan Publishers Limited. All rights reserved 


Control Plasmids 


pMtEF1-SLE4 
pCockleSLE-EF1 
pVaN2-EF 1SLE 
pVaH2-EF 1SLE 


pMtEF1-SLE4 
pCockleSLE-EF 1 
pVaN2-EF 1SLE 
pVaH2-EF 1SLE 
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Mitochondrial unfolded protein response controls 
matrix pre-RNA processing and translation 


Christian Mtinch! & J. Wade Harper! 


The mitochondrial matrix is unique in that it must integrate the 
folding and assembly of proteins derived from the nuclear and 
mitochondrial genomes. In Caenorhabditis elegans, the mito- 
chondrial unfolded protein response (UPR™) senses matrix protein 
misfolding and induces a program of nuclear gene expression, 
including mitochondrial chaperonins, to promote mitochondrial 
proteostasis'*. While misfolded mitochondrial-matrix-localized 
ornithine transcarbamylase induces chaperonin expression*, our 
understanding of mammalian UPR™ is rudimentary’, reflecting 
a lack of acute triggers for UPR™ activation. This limitation has 
prevented analysis of the cellular responses to matrix protein 
misfolding and the effects of UPR™ on mitochondrial translation 
to control protein folding loads. Here we combine pharmacological 
inhibitors of matrix-localized HSP90/TRAPI (ref. 8) or LON 
protease’, which promote chaperonin expression, with global 
transcriptional and proteomic analysis to reveal an extensive and 
acute response of human cells to UPR™. This response encompasses 
widespread induction of nuclear genes, including matrix-localized 
proteins involved in folding, pre-RNA processing and translation. 
Functional studies revealed rapid but reversible translation 
inhibition in mitochondria occurring concurrently with defects 
in pre-RNA processing caused by transcriptional repression 
and LON-dependent turnover of the mitochondrial pre-RNA 
processing nuclease MRPP3 (ref. 10). This study reveals that acute 
mitochondrial protein folding stress activates both increased 
chaperone availability within the matrix and reduced matrix- 
localized protein synthesis through translational inhibition, and 
provides a framework for further dissection of mammalian UPR™. 

Protein folding homeostasis is central to cell fitness. Protein 
unfolding in the endoplasmic reticulum (ER) promotes transcrip- 
tional induction of ER-associated chaperones to facilitate folding and 
inhibits translation to further reduce the folding load!!. In contrast, 
mechanisms underlying the response to protein misfolding in other 
organelles, including mitochondria, are poorly understood. The mito- 
chondrial matrix folding machinery consists of chaperonins HSPD1/ 
HSP60 and HSPE1/HSP 10, and chaperones including the HSP90 par- 
alogue TRAP1 and mtHSP70. This machinery assists in the folding of 
matrix-localized nuclear-encoded proteins, and their assembly with 
13 respiratory chain proteins encoded by the mitochondrial genome 
(mtDNA). The balance between folding load and chaperone abun- 
dance is controlled, in part, by the UPR™ In C. elegans, genetic UPR™ 
activation promotes nuclear localization of the ATFS-1 transcription 
factor to induce expression of mitochondrial chaperonins, thereby 
enhancing matrix folding capacity’. While earlier work revealed that 
enforced expression of misfolded ornithine transcarbamylase in HeLa 
cells induced HSPD1 and HSPE1 expression*°, our understanding of 
UPR™ in human cells is limited. 

Cellular stress responses such as UPR™ are typically fast acting as a 
result of rapid sensing of protein folding stress, but prolonged activa- 
tion can produce confounding effects such as cell death'*. We therefore 
examined whether gamitrinib-triphenylphosphonium (GTPP)—a 


specific inhibitor of the matrix HSP90 chaperone TRAP1 known to 
cause protein misfolding in this compartment®'*—would promote 
acute transcription of HSPD1 and HSPE1 as readout of UPR™ induction 
in HeLa cells. Acute GTPP treatment (6h) induced UPR™ as assessed 
by quantitative PCR (qPCR) for HSPD1 and HSPE1 (Extended Data 
Fig. 1a) with a dynamic range (approximately twofold) similar to that 
seen with genetic UPR™ induction in C. elegans’. HSPD1 and HSPE1 
are among the most abundant messenger RNAs (mRNAs) in untreated 
cells (top 2nd percentile), explaining their limited dynamic range upon 
UPR™ (Supplementary Table 1). GTPP treatment did not affect cell 
viability, mitochondrial membrane potential, ATP levels or respiratory 
chain architecture (Extended Data Fig. 1b-e). Longer (24h) incubations 
with GTPP result in cell death®. Consistent with TRAP] being the causal 
target for GIPP-dependent chaperonin induction, TRAP1 RNA inter- 
ference (RNAi) also induced HSPD1 by qPCR (Extended Data Fig. 1f). 

C/EBP homologous protein (CHOP), a broadly acting transcription 
factor, is induced via UPR® and the integrated stress response via the 
ATF4 transcription factor'!. CHOP is also induced during UPR™ 
(refs 4, 5) and oxidative stress!°, but the mechanisms underlying CHOP 
activation in UPR™ and its relationship between UPR" and integrated 
stress response upstream signalling remained unclear. Strikingly, we 
found that GTPP, but not the UPR™ activator tunicamycin, respiratory 
chain inhibitors or mitochondrial membrane decouplers, activated 
HSPD1 expression (Fig. la and Extended Data Fig. 2a). GIPP also 
activated ATF4 and CHOP, but unlike tunicamycin, did not induce BIP, 
indicating that GPP does not activate canonical UPR" (Fig. 1b, cand 
Extended Data Fig. 2a, b). We also found that individual depletion of 
the four known EIF2A kinases involved in integrated stress response 
signalling (GCN2, HRI, PERK and PKR)!! had no effect on CHOP 
induction by GTPP (Extended Data Fig. 2c-f), suggesting that induction 
of ATF4 and CHOP by UPR™ occurs through a pathway independ- 
ent of individual integrated stress response kinases” (Extended Data 
Fig. 2b). Taken together, these data indicate that GTPP induces UPR™ 
through a pathway distinct from known ER and mitochondrial stress 
pathways (Extended Data Fig. 2b). 

To globally examine the mammalian UPR™ transcriptional response, 
we treated HeLa cells with GTPP for 6 h and performed RNA sequencing 
(RNA-seq) (Fig. 1d, e, Extended Data Fig. 3a—b and Supplementary 
Table 1). In parallel, we determined RNA-seq profiles upon treatment of 
cells with CDDO, an inhibitor of matrix protease LON (Fig. 1d). CDDO 
rapidly induces mitochondrial protein misfolding® and induced HSPD1 
expression, consistent with UPR™ induction (Extended Data Fig. 3c). 
From 968 (GTPP) and 1,029 (CDDO) transcripts whose abundance 
changed significantly (P< 0.05, log, > +0.6), 627 were shared between 
the two different treatments with 337 and 290 downregulated and 
upregulated transcripts, respectively, including HSPD1 and HSPE1, and 
CHOP (Fig. 1d-f and Extended Data Fig. 3d, e). Importantly, changes 
in transcription with GTPP treatment were distinct from changes 
previously reported with 17-AAG!°, a derivative of GTPP that inhibits 
cytoplasmic and nuclear HSP90 (Extended Data Fig. 3e), indicating 
that inhibition of non-mitochondrial HSP90 is unlikely to account for 
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the transcriptional response with GTPP. Gene ontology enrichment 
analysis confirmed extensive overlap in the transcriptional responses, 
with all gene ontology clusters representing transcripts altered with both 
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Figure 1 | Global analysis of 
transcriptional responses to 
UPR™ induction. a—c, gPCR of 
HSPD1 (a), CHOP and BIP (b) 

or ATF4 (c) mRNA in HeLa cells 
with or without the indicated 
treatments (mean of levels relative 
to untreated + s.d.; n = 3 biological 
replicates). d, Experimental design 
(top). Volcano plot showing fold 
changes versus P values for the 
analysed transcriptome of cells 
treated with GTPP (bottom left) or 
CDDO (bottom right). Transcripts 
significantly changing upon UPR™ 
induction (P< 0.05, changes > log» 
0.6) are represented by black 
dots. e, Correlation of ratios of 
transcripts changing upon GTPP 
or CDDO treatment. Black dots, 
P<0.05, changes > log, +0.6; red 
dots, genes of interest. f, Summary 
of altered transcripts. g, h, Gene 
ontology (GO) enrichment map (g) 
and heat map (h) of overlapping 
mitochondrial transcripts altered 
by both GTPP and/or CDDO. 


606 mitochondrial proteins quantified (442 with 2 or more peptides), 
61 displayed significant changes in abundance 6 h after GTPP treat- 
ment compared with control or CCCP-treated cells, including HSPD1 


treatments (Fig. 1g and Supplementary Table 2). As expected, gene ontol- 
ogy terms showed enrichment for protein folding genes, consistent with 
UPR™ induction, but also included transfer RNA (tRNA) processing 
and activation. Among the nuclear genes with correlated changes in 
transcription, 36 encode proteins known to localize in mitochondria 
(Fig. 1h and Supplementary Table 1). Promoter analysis of genes reg- 
ulated by UPR™ induction showed enrichment of CHOP and ATF4 
promoter recognition sequences, as well as two ‘mitochondrial UPR 
Response Element’ (MURE1 and MURE2) promoter elements® 
(P <0.0001; Extended Data Fig. 4 and Supplementary Table 3). This anal- 
ysis therefore revealed a specific nuclear response to UPR™ that is antic- 
ipated to promote homeostasis of protein folding within mitochondria. 

We then applied MultiNotch proteomics'” (Extended Data Fig. 5a) to 
purified mitochondria to quantify acute changes in the mitochondrial 
proteome upon GTPP treatment using untreated cells or cells treated with 
the mitochondrial uncoupler CCCP (carbonyl cyanide-m-chlorophenyl 


and HSPE1, which increased as expected (Fig. 2a, b and Extended Data 
Fig. 5b). Furthermore, proteins involved in respiration, transcription, 
tRNA processing and protein quality control, among others, were found 
to be regulated (Fig. 2c). In contrast, levels of the mitochondrial ribosome 
and respiratory chain complexes were not significantly altered (Extended 
Data Fig. 5c, d), consistent with their long half-lives}. Strikingly, the 
abundance of the mitochondrial matrix protein MRPP3 was reduced 
at both the transcriptional and protein level (Fig. le, h and Fig. 2b-d). 
MRPP3 is the catalytic subunit of the RNA-free mitochondrial RNase 
P complex, which also includes MRPP1 and MRPP2 (ref. 10). MRPP1 
and MRPP2 mRNA and protein levels were unchanged or increased 
in response to GIPP or CDDO (Fig. 2d), suggesting a rather specific 
downregulation of MRPP3 in the context of RNase P. 

mtDNA-derived polycistronic pre-RNA contains protein coding and 
ribosomal RNA elements flanked by tRNA genes. RNase P and RNase 
Z cleave pre-RNA 5’ and 3’ of tRNAs, respectively, with 5’ cleavage 


; 17 f 10 P F mt 
hydrazone) as controls (Fig. 2a and Supplementary Table 4)’. From _ preceding 3’ cleavage*’. Consistent with reduced MRRP3 upon UPR™, 
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Figure 2 | Changes in the mitochondrial proteome upon UPR™ 
induction. a, Volcano plot showing fold changes versus P values for total 
quantified and quantified mitochondrial proteins. b, Volcano plot showing 
fold changes versus P values for quantified mitochondrial proteins. 
Proteins significantly changing are indicated by green dots. c, Heatmap 
organized by gene ontology groups of mitochondrial protein level changes. 
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Proteins that did not change significantly are indicated in grey. 

d, Histogram of protein (b) and/or mRNA (Fig. 1) abundance for 
chaperonin and mitochondrial RNase P subunits. Two-tailed P values 
*P<0.05, **P< 0.01, ***P < 0.001, mean of n=3 (RNA) or n=2 
(protein) biological replicates. NS, not significant. #P = 0.06. 
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Figure 3 | Mitochondrial pre-RNA processing defects upon UPR™. 

a, qPCR of mitochondrial pre-RNA at tRNAM* and tRNAs RNaseP 
processing sites upon induction of UPR™ with GTPP (6 h). Error bars, 
averages + s.d. (n =3 biological replicates). b, RNA-seq for analysis of 
mitochondrial pre-RNA processing defects based on number of reads 
crossing the tRNA/mRNA gene junction. Slope of coverage in the tRNA gene 
adjacent to the cut site is used as a measure of processing. c, d, Normalized 
RNA-seq coverage across tXNA/mRNA gene borders for tRNA™" (c) and 
tRNA’ (d) with average of slopes (+s.d.) as described in b indicated in the 
inset (n= 3 biological replicates, two-tailed P values *P < 0.05, ***P < 0.001). 
e, Quantitative western blot analysis of MRPP3 levels upon treatment of 

cells with dimethylsulfoxide (DMSO), GTPP, CDDO or GTPP + CDDO 
co-treatment for 6h. f, Mitochondrial pre-RNA accumulation upon 
co-treatment with GTPP and CDDO (as in a). Data are average values + s.d. 
(n=3 biological replicates). For gel source data, see Supplementary Fig. 1. 


we observed a 1.5- to 4.5-fold increase in non-processed mitochondrial 
tRNA” and tRNAM* 6h after GIPP treatment (Fig. 3a and Extended 
Data Fig. 6a), comparable to effects seen upon depletion of MRPP3 
by siRNA (Extended Data Fig. 6b, c). To independently examine 
pre-RNA processing, we analysed coverage of pre-RNA cleavage sites 
in tRNA™* and tRNA’ by RNA-seq. When pre-RNA processing is 
defective, sequence reads from the adjacent mRNA can extend into 
the tRNA, indicative of reduced processing as quantified via slopes 
of coverage (Fig. 3b). Indeed, upon UPR™, we observed increased 
slopes for sequence reads crossing ATP8-tRNA* and ND2-tRNAM* 
junctions (Fig. 3c, d). pre-RNA processing defects were absent with 
CCCP-dependent damage, suggesting a specific protein folding 
response (Extended Data Fig. 6d). While MRPP3 mRNA and pro- 
tein levels are reduced upon treatment with GITPP, CDDO resulted in 
reduced MRPP3 mRNA levels without reduced MRPP3 protein levels 
(Figs 2d and 3e), suggesting LON-dependent MRPP3 degradation. 
Indeed, co-treatment with GITPP and CDDO resulted in no reduction 
in MRPP3 abundance (Fig. 3e) and, moreover, CDDO co-treatment 
rescued pre-RNA processing defects seen with GTPP alone (Fig. 3f). 
It is currently unclear how MRPP3 is made more susceptible to degra- 
dation in response to GTPP, but we conclude that this does not occur 
at the level of LON abundance, as LON is not increased upon GTPP 
treatment (Extended Data Fig. 6e, f). 

We next examined whether loss of MRPP3 and defects in pre-RNA 
processing during UPR™ could be overcome by its stable expression. 
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Previous studies have shown that alterations in the abundance of 
mitochondrial RNase P components can alter pre-RNA processing in 
unanticipated ways, making interpretation of effects of MRPP3 over- 
expression difficult!’. Similarly, we found that elevated MRPP3 levels 
(~11-fold) altered steady-state processing efficiencies, with enhanced 
tRNA™* processing and tRNA displaying enhanced 3’ processing 
and decreased 5’ processing (Extended Data Fig. 7a, b). While MRPP3 
levels were still reduced upon GTPP treatment, consistent with LON 
activity not being limiting, residual MRPP3 remained approximately 
fivefold higher than in untreated cells (Extended Data Fig. 7a)!”. 
Importantly, residual MRPP3 partly rescued tRNAM* and tRNAs 
processing (Extended Data Fig. 7c), consistent with the notion that loss 
of MRPP3 during UPR™ contributes to pre-RNA processing defects. 

UPR" inhibits cytosolic translation through phosphorylation of 
eIF2a and local degradation of mRNAs by IRE] (ref. 11). The altera- 
tions in genes linked with mitochondrial protein synthesis (Figs 1 and 2) 
together with the finding that mitochondrial pre-RNA processing 
is deficient during UPR™ led us to examine whether UPR™ affects 
translation of mRNAs derived from mtDNA (Fig. 4a). Indeed, GTPP 
treatment (6h) strongly inhibited *°S-methionine incorporation into 
newly synthesized respiratory chain components in a concentration- 
dependent manner (Fig. 4b, c and Extended Data Fig. 8a) with- 
out affecting cytoplasmic translation rates (Extended Data Fig. 8b). 
To further validate the inhibitory effect of UPR™ on mitochondrial 
translation, we used stable isotope labelling by amino acids in culture 
(SILAC) and mass spectrometry to quantify the ratio of newly synthe- 
sized (K8-Lys) to pre-existing (KO-Lys) protein for mitochondrially 
encoded proteins (Fig. 4d). Translational inhibition was confirmed 
for ND5, COI, ATP6 and ATP8 (Fig. 4d and Extended Data Figs 8c 
and 9a, b), with peptide coverage comparable to previous deep pro- 
teome studies in HeLa cells”°. Translational inhibition by GTPP, as 
well as pre-RNA processing, was largely recovered within 4h of GTPP 
wash-out (Fig. 4e and Extended Data Fig. 10a, b), indicating that UPR™ 
is rapidly reversible. 

We find that acute mitochondrial folding stress promotes a 
multifaceted response involving (1) altered expression of nuclear 
genes, including mitochondrial chaperonins, to increase matrix protein 
folding capacity, (2) transcriptional repression and LON-dependent 
degradation of MRPP3 to reduce pre-RNA processing and (3) induction 
of rapid but reversible translational inhibition of mtDNA-encoded 
proteins, thereby reducing matrix folding load (Fig. 4f). As with 
pre-RNA processing (Extended Data Fig. 7b), cells overexpressing 
MRPP3 display altered translation of mtDNA-encoded proteins, with 
ND5, COI, ND2 and COIII showing decreased translation relative to 
control cells (Extended Data Fig. 10c), which complicates interpreta- 
tion. However, residual MRPP3 post-GTPP treatment did not rescue 
bulk mitochondrial translation (Extended Data Fig. 10c). This could 
reflect sub-threshold levels of tRNA processing despite partial rescue 
(Extended Data Fig. 7) or redundancy in the UPR™ pathway, thereby 
affecting other steps in the translation pathway (Fig. 4f), as is the case 
with UPR* (ref. 11). Alternatively, because MRPP1-dependent tRNA 
methylation critical for tRNA maturation requires assembly with 
MRPP3 (ref. 21), MRPP3 overexpression may uncouple pre-RNA 
processing from tRNA methylation, resulting in translational defects 
despite the presence of MRPP3. While the TFB1M methyltransferase 
responsible for mitochondrial 12S rRNA methylation is reduced 
transcriptionally (Fig. 1h), its protein abundance is unchanged at 6h 
post-GTPP (Extended Data Fig. 10d), indicating that defects in rRNA 
methylation do not underlie translational inhibition. Thus, further 
studies are required to understand the regulation of mitochondrial 
translation with and without mitochondrial stress. In keeping with the 
transient nature of stress responses}, our work has focused on acute 
effects of UPR™. Components linking mitochondrial protein misfold- 
ing to the nucleus remain to be identified, as ATFS-1 orthologues are 
lacking in mammals. Interestingly, the stress-inducible protein ATF3 
(ref. 22), which contains a basic leucine zipper-like domain similar 
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to ATFS-1, and which can function with CHOP, is also induced 1.5- to 
4-fold by UPR™ (Supplementary Table 1), suggesting a possible role 
in UPR™ signalling. Prolonged UPR™ and concomitant translational 
inhibition probably leads to confounding effects that would be det- 
rimental to mitochondrial health, consistent with the application of 
GTPP to cancer therapeutics'*. The transcriptional and proteomic data 
reported here provide a framework for the further elucidation of cir- 
cuits that contribute to protein homeostasis within mitochondria, and 
for the development of approaches that can manipulate the response 
of cells to mitochondrial folding stress, as might occur in pathological 
conditions including cancer and neurodegenerative diseases. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Chemicals and antibodies. LysC (VWR catalogue number 100369-822), CDDO 
(Cayman Chemicals catalogue number 81035), emetine (Sigma catalogue num- 
ber E2375), CCCP (Sigma catalogue number C2759), rotenone (Sigma catalogue 
number R8875), paraquat (Sigma catalogue number 36541), TTFA (Sigma cata- 
logue number T27006), 3-Nitropropionic acid (Sigma catalogue number N5636), 
antimycin A (Sigma catalogue number A8674), myxothiazole (Sigma catalogue 
number T5580), potassium cyanide (Sigma catalogue number 60178), valinomy- 
cin (Sigma catalogue number V0627) and K8 lysine (Cambridge Isotopes). An 
original aliquot of GTPP was a gift from D.C. Altieri; a second aliquot was custom 
synthesized by Shanghai ChemPartner Co. Antibodies used were anti- MRPP3 
(LSBio catalogue number LS-C332515, western blot: 1:500), anti-TOM20 (Santa 
Cruz catalogue number sc-11415, western blot: 1:500), anti-LON (Sigma cata- 
logue number HPA002192, western blot 1:500), anti- ACTIN (Santa Cruz catalogue 
number sc69879, western blot 1:500), anti-TFB1M (Abcam, catalogue number 
69871, western blot 1:400), anti: NDUFA9 (Abcam catalogue number ab14713, blue 
native 1:1,000), anti-SDHA (Abcam catalogue number ab14715, blue native 1:1,000) 
and anti- UQCRC2 (Abcam catalogue number ab14745, blue native 1:1,000). 

Cell culture and assays for cytotoxicity, mitochondrial membrane potential and 
cellular ATP levels. HeLa cells were purchased from the American Type Culture 
Collection (ATCC) and not further authenticated. They were confirmed to be myco- 
plasma negative, and grown in RPMI medium supplemented with 1 x glutamax 
(Invitrogen catalogue number 61870-127) and 10% fetal bovine serum. For all 
experiments, cells were treated with DMSO and 10,.M GTPP (or concentration as 
indicated) and/or 2.511M CDDO for 6 h. For CCK8 cytotoxicity assays, cells were 
plated in clear bottom 96-well plates, processed according to the manufacturer's 
instructions (CCK8 Dojindo CK04-05) and quantified on a VersaMax microplate 
reader (Molecular Devices). For mitochondrial membrane potential determina- 
tion, cells were treated with JC-1 (Life Technologies catalogue number T3168) 
according to the manufacturer's instructions. Cells were harvested and analysed 
by fluorescence-activated cell sorting on a BD FACSCalibur. To assess cellular ATP 
levels, cells were plated on 96-well clear-bottom plates and treated with DMSO, 
GTPP or 100\.M antimycin A. ATP levels were measured with the Mitochondrial 
ToxGlo assay (Promega G8000) and analysed on a Molecular Devices SpectraMax 
M5 multi-mode plate reader. 

Quantitative PCR and RNA sequencing. Total RNA was harvested using 
NucleoSpin RNA or NucleoSpin miRNA for analysis of pre-RNA processing 
(Macherey-Nagel catalogue numbers 740955 and 740971). RNA was quanti- 
fied and equal amounts were reverse transcribed into complementary DNA 
(cDNA) using a High Capacity cDNA Reverse Transcription Kit (Applied 
Biosystems catalogue number 4368814). Quantitative PCR was performed using 
TaqMan Fast universal PCR Master Mix (Applied Biosystems catalogue number 
4366072) or Fast SYBR Green Master Mix (Life Technologies catalogue num- 
ber 4385612) using an Applied Biosystems 7500 Fast Real-time PCR machine 
with the following primers: tRNA forward: AGTAAGGTCAGCTAAATAAG, 
tRNAMt 5/ upstream forward: GAATCGAACCCATCCCTGAG, tRNAMt 
reverse: TAGTACGGGAAGGGTATAACC, tRNA™* downstream reverse: 
GTGTGCCTGCAAAGATGGTAG, tRNA’ forward: CACTGTAAAGCTAA 
CTTAGC, tRNAs 5’ upstream forward: GAAATAGGGCCCGTATTTACC, 
tRNA reverse: TCACTGTAAAGAGGTGTTGG, tRNA’ 3’ downstream 
reverse: GATGAGGAATAGTGTAAGGAG, GAPDH forward: ATGCCTCCTGC 
ACCACCAAC, GAPDH reverse: GGGGCCATCCACAGTCTTCT, ND4 forward: 
CTTCGAAACCACACTTATCC, ND4 reverse: GTATGCAATGAGCGATTT 
TAGG, or Life Technologies TaqMan probes for GAPDH (Hs02758991_g1), 
HSPD1 (Hs03044918_g1), HSPE1 (Hs01654720_g1), MRPP3 (Hs00206448_m1), 
TRAP (Hs00212474_m1), DDIT3 (Hs00358796_g1), ATF4 (Hs00909569_g1) and 
BIP (Hs00607129_gH). For analysis of pre-RNA processing, data were normalized 
to tRNA levels obtained from using internal forward and reverse tRNA primers. 
For analysis of integrated stress response activation, cells were treated with 101M 
GTPP, 10g ml! tunicamycin, 51M rotenone, 0.5mM paraquat, 0.5mM TTFA, 
10mM 3NP, 100M antimycin A, 3;1M myxothiazol, 1mM KCN, 10)1M CCCP 
or 14M valinomycin for 6h before purification and analysis of RNA levels by 
quantitative PCR. 

For RNA sequencing, total RNA samples were submitted to the Harvard 
Bauer Core Facility for processing (ribosomal depletion with RiboZero), direc- 
tional RNA-seq library preparation and 12 cycle amplification using LongAmp 
(New England BioLabs) and indexed primers (Integrated DNA Technologies), 
quality control and sequencing in one flow cell on a 75 bp paired-end NextSeq 
for transcriptome analysis or one lane on a 100 bp paired-end HiSeq to monitor 
mitochondrial RNA processing. For transcriptome analysis, reads were examined 
by FastQC and analysed by the tophat2 version 1.2 analysis pipeline by Harvard 
Medical School Research Computing against hg19, consisting of analysis by tophat, 
cufflinks, and cuffmerge. For analysis of mitochondrial pre-RNA processing, reads 


were examined by FastQC, trimmed with cutadapt (for PHRED scores below five) 
and aligned to hg19 (augmented with transcript information from GRCh37.75) 
by STAR. Alignments were checked by FastQC and RNA-SeQC, and read counts 
of known genes detected by featureCounts. 

To analyse the RNA-seq data set for pre-RNA processing, coverage data across 
a tRNA/mRNA region was normalized for reads within the protein-coding gene 
region across all six experimental conditions at every cut site. Slopes of the first 
ten nucleotides within the tRNA genes adjacent to the cut site were determined in 
Microsoft Excel (presented slopes had correlation values of R> 0.9) and calculated 
as an average of the mean with two-tailed P values. 
Gene ontology enrichment analysis. Sets of genes of interest were uploaded and 
searched with the DAVID online tool (http://david.abcc.ncifcrf.gov/home.jsp) for 
enriched biological processes (GOTERM_BP_FAT). Functional annotation charts 
were saved and visualized with the enrichment map app (version 2.0.1)? in cytos- 
cape (version 3.2.1, P< 0.001). Clusters were annotated according to their general 
functional with their overlapping biological process. 
Promoter analysis. Three thousand bases upstream of the transcription start site 
of the transcripts encoding mitochondrial proteins and regulated by UPR™ were 
extracted from ensemble. These promoter sequences were analysed by FIMO 
(Find Individual Motif Occurrences, version 4.11.1)**. Motifs were provided as 
indicated and scanned on a provided DNA database with the listed promoter 
sequences. P values were set at 0.0001 and defined as the probability of random 
sequences of identical length achieving a similar or better score as the sequence 
provided. 
RNAi experiments. Cells were grown on 12-well plates and RNAi was transfected 
with Lipofectamine 3000 (Life Technologies) according to the manufacturer's 
instructions. RNAi used was MRPP3 (Ambion, AM16708, ID 21858), and 
TRAP1 (DF/HCC DNA Resource Core IDs: HsSH00112394, HsSH00112407), 
EIF2AK1 (Dharmacon LQ-005007-00-0002), EIF2AK2 (Dharmacon LQ-003527- 
00-0002), EIF2AK3 (Dharmacon LQ-004883-00-0002) and EIF2AK4 (Dharmacon 
LQ-005314-00-0002). 
Cell line generation. Human cDNA for MRPP3 was purchased from Sino 
Biological (HG14131-G) and transferred into a pHAGE lentiviral vector. Virus 
particles were produced in HEK293T cells after transfection with the lentivial 
vector and helper vectors (VSVG, Tatlb, Mgpm2, CMV-Rev) and used to infect 
HeLa cells. Cells were selected in 1g ml“! puromycin. 
Mass spectrometry. For quantitative analysis of the mitochondrial proteome, HeLa 
cells were treated with 10\1M GTPP, CCCP, or DMSO for 6h. Mitochondria were 
purified as previously described using Basic Protocol 1 (ref. 25). Briefly, cells were 
scraped into cold PBS, collected by centrifugation, resuspended in lysis buffer and 
sonicated. Crude mitochondria were acquired by differential centrifuga- 
tion and purified mitochondria obtained by separation on a sucrose cushion. 
Similar amounts of mitochondria were obtained under the different treatments. 
Mitochondrial pellets were resuspended in lysis buffer (6M GdnHCl, 75 mM NaCl, 
50 mM Tris, pH 8.5, 1mM PMSF, 1x OPT) and sonicated. Samples were reduced, 
alkylated with iodoacetamide and proteins were precipitated using chloroform/ 
methanol. Protein pellets were resuspended in 8 M urea in 50 mM Tris, pH 8.8 
and subsequently diluted with 50 mM Tris, pH 8.8 to a urea concentration of 2M. 
Proteins were digested with LysC overnight at 37°C. Digestion reactions were 
stopped by addition of formic acid, dried and purified by C18 stage tip. Samples 
were taken up in 0.2 M HEPES, pH 8.5 buffer, quantified by micro BCA (Thermo 
Scientific catalogue number 23235) and labelled with TMT 6-plex reagents 
(Thermo Scientific) for 1 h at room temperature. Reactions were stopped by addi- 
tion of 5% hydroxylamine for 15 min followed by addition of formic acid. Equal 
amounts of peptide samples were combined to a total of 101g and purified on a 
C18 stage tip. Dried peptides were resuspended in 5% acetronitrile/5% formic acid 
and analysed on an Orbitrap Fusion (Thermo Scientific) running a 2 h gradient 
from 6% to 30% acetonitrile using a multi-notch MS3-based method” selecting 
ten MS2 fragment ions for analysis by MS3 (Orbitrap, AGC 5 x 10*, 60,000 reso- 
lution, maximum injection time 150 ms). Peptides were identified and quantified 
by a SEQUEST-based in-house tool (developed by the S. P. Gygi laboratory) using 
SEQUEST with a human UniProt database (as of 14 January 2014), and submitted 
to linear discriminant analysis to score peptides and proteins with protein and 
peptide FDR values of 2% (ref. 26). Proteins were collapsed to a protein-level 
FDR of 2%. Searches were run for LysC with a maximum of two missed cleav- 
age sites and with carbamidomethylation of cysteine residues and TMT tags on 
lysine residues and N termini as static modifications, and methionine oxidation as 
variable modification. TMT-based quantitation was performed by TMT-reporter 
ion analysis for all identified proteins. MS3 spectra with a summed signal-to-noise 
ratio of <100 were excluded and the TMT channels normalized across all TMT 
channels (with resulting normalization factors between 1 and 1.252). For final 
analysis of quantified proteins, values were transferred and analysed in Microsoft 
Excel and the following cut-offs were applied: minimum number of two quantified 
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peptides, two-tailed P< 0.05, fold change > log, £0.35. Quantified proteins were 
determined as mitochondrial if they were found in MitoCarta’’, or a IMPI score 
of >0.85 (version Q1 2015, http://www.mrc-mbu.cam.ac.uk/impi). For SILAC 
analysis, cell culture media was replaced with lysine-free media supplemented 
with K8 lysine and dialysed serum, and treated with DMSO or GTPP. After 
6h, cells were harvested and mitochondria purified, lysed and either processed 
as for TMT experiments (experiments 1-3), or run on a NuPAGE Novex 12% 
bis-tris gel (Life Technologies) and cut into five fractions, in-gel reduced, alkylated, 
and digested by LysC (experiment 4). Digested mitochondrial extracts and gel- 
extracted peptides were purified on C18 stage tips and analysed by LC-MS/MS 
ona Q Exactive or Orbitrap Fusion (Thermo Scientific) as indicated. Q Exactive 
analysis used a maximum injection time of 250 ms, an AGC target of 10°, resolu- 
tion of 70,000 and automatic dynamic exclusion settings. For SILAC analysis on 
the Orbitrap Fusion, maximum injection times were set at 100 ms, AGC target at 
2 x 10°, 120,000 resolution and a dynamic exclusion of 90s. Experiments were 
processed with our in-house analysis tool and/or Maxquant (as indicated). For our 
in-house tool (Core), analysis was as above and quantification done by analysing 
peak heights for the heavy and light forms of a peptide. We performed MaxQuant 
analysis (version 1.5.2.8) with standard Orbitrap settings and LysC digestion mode 
with cysteine carbamidomethylation as static and methionine oxidation as variable 
modification against a UNIPROT library (as of 9 March 2015). The minimum 
ratio count of the protein quantification was set at 1. Results were exported into 
Microsoft Excel to calculate heavy-to-light ratios of peptides to determine the 
percentage of newly synthesized protein as a fraction of heavy peptide intensity 
versus total intensity. Results of the Maxquant/Core quantifications are shown in 
Extended Data Fig. 9c. Owing to the consistent difficulty of both analysis tools in 
determining heavy peptide intensities in the GTPP-treated samples, heavy and 
light peptide intensities were also manually determined from MS1 at the observed 
m/z values and retention times determined by the Maxquant/Core analyses 
(Fig. 4d and Extended Data Fig. 10). 

Mitochondrial translation assay. HeLa cells were grown on a 12-well plate and 
treated for 6h with DMSO or different concentrations of GTPP as described. 
After 5.5h, media was replaced with RPMI lacking methionine and containing 
10% dialysed fetal bovine serum, GTPP or DMSO (at the original concentration), 
and 100j1g ml“! emetine to block cytosolic translation. After 10 min, 1001C ml"! 


LETTER 


EasyTag *°S-methionine (Perkin Elmer catalogue number NEG709A500UC) 
was added and incubated for another 20 min, totalling 6h of GTPP treatment. 
Cells were washed with PBS and harvested in 1x NuPAGE LDS sample buffer 
(Life Technologies) containing 25 mM DTT. Samples were boiled and analysed 
on a NuPAGE Novex 12% bis-tris gel (Life Technologies). Gels were stained using 
InstantBlue (Expedeon), dried onto Whatman paper and visualized on a Bio-Rad 
Personal Molecular Imager for newly synthesized and radioactive proteins. An 
image was taken of the InstantBlue stained gel to confirm equal loading. These 
experiments were performed three independent times. For pulse-chase analysis, 
the same protocol was used with washes as indicated. This experiment was 
performed twice independently. 

Blue native. Crude mitochondria were obtained as above and lysed in 1% 
digitonin, followed by separation on 4-16% BN-PAGE as previously described”*. 
Proteins were transferred onto polyvinylidene difluoride membranes and detected 
using antibodies as indicated. A small aliquot was also analysed by standard 
western blot to confirm equal loading. 

Data reporting and statistics. No statistical methods were used to predetermine 
sample size. The experiments were not randomized. The investigators were not 
blinded to allocation during experiments and outcome assessment. All quantitative 
experiments are presented as means + s.d. of at least two independent biological 
experiments (as indicated) and were analysed by a two-tailed Student's t-test 
(considered significant for P< 0.05). 


23. Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: 
a network-based method for gene-set enrichment visualization and 
interpretation. PLoS ONE 5, e13984 (2010). 

24. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a 
given motif. Bioinformatics 27, 1017-1018 (2011). 

25. Bozidis, P., Williamson, C. D. & Colberg-Poley, A. M. in Current Protocols in Cell 
Biology (eds Bonifacino, J. S.et al.) Ch. 3, Unit 3.27 (Wiley, 2007). 

26. Huttlin, E. L. et a/. A tissue-specific atlas of mouse protein phosphorylation and 
expression. Cel! 143, 1174-1189 (2010). 

27. Pagliarini, D. J. et al. A mitochondrial protein compendium elucidates complex 
| disease biology. Ce// 134, 112-123 (2008). 

28. McKenzie, M., Lazarou, M., Thorburn, D. R. & Ryan, M. T. Mitochondrial 
respiratory chain supercomplexes are destabilized in Barth syndrome patients. 
J. Mol. Biol. 361, 462-469 (2006). 
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Extended Data Figure 1 | Mitochondrial HSP90 inhibition induces 
UPR™. a, Quantitative PCR monitoring chaperonin (HSPD1 and 

HSPE1) mRNA levels upon treatment of cells with GTPP. Shown are 
means of levels relative to untreated + s.d. (n = 3 biological replicates). 

b, Measurement of cell viability upon GTPP treatment with CCK8. Shown 


are means of levels relative to untreated + s.d. (n =5 biological replicates). 


c, Measurement of mitochondrial membrane potential upon GTPP 

or CCCP (mitochondrial membrane potential uncoupler) treatment, 
measured with JC-1 and analysed on a BD FACSCalibur. Shown are 
means of levels relative to untreated + s.d. (n = 3 biological replicates). 
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d, Measurement of cellular ATP levels upon GTPP or antimycin A 
(electron transport chain inhibitor) treatment. Shown are means of levels 
relative to untreated + s.d. (n= 4 biological replicates) and two-tailed 

P values ***P < 0.001; NS, not significant. e, Blue native gel analysis 

of mitochondrial respiratory chain complexes upon 6 h treatments of 
DMSO or GTPP. f, Changes in chaperonin and TRAP1 mRNA levels upon 
knockdown with shRNA targeting GFP or TRAP1 mRNA. Shown are 
means of log; fold changes relative to control and s.d. (n = 3 biological 
replicates). 
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Extended Data Figure 2 | UPR™ signals distinctly from the integrated knockdown of the four EIF2A kinases by siRNA smart pools in biological 


stress response. a, Table with summarized results of data shown in duplicate; repl., replicate d, Quantitative PCR monitoring CHOP mRNA 
Fig. la—c. Induced genes are labelled green and compounds are clustered levels in untreated or GTPP-treated cells with or without knockdown of 
into their molecular function. GIPP induces UPR™ and tunicamycin the EIF2A kinases as in c. e, Quantitative PCR to monitor PERK mRNA 
ERUPR. All other compounds affect mitochondrial respiration and/or the levels upon PERK knockdown with individual siRNAs in biological 
mitochondrial membrane potential. b, Schematic showing how different duplicate. f, Quantitative PCR monitoring CHOP mRNA levels in GTPP- 


stresses signal through the integrated stress response pathway based onthe __ treated cells with or without knockdown of PERK by individual siRNAs in 
results shown in b and Fig. la-c. c, Quantitative PCR to assess the mRNA biological duplicate. 
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Extended Data Figure 3 | Global analysis of transcriptional responses by measuring chaperonin mRNA levels upon treatment with DMSO or 
to UPR™ induction. a, Heatmap of measured transcript abundances of CDDO. Shown are means of levels relative to DMSO-treated + s.d. (n=3 
cells treated with DMSO, 101.M GTPP or 2.54.M CDDO for 6h (n=3 biological replicates). d, Correlation between the abundance of transcripts 
biological replicates). Values not passing the cuffdiff threshold of FPKM significantly altered in GTPP- versus CDDO-treated cells (Fig. 1c, 
abundance and read number were excluded (white). b, Correlation combined panel). e, Table representing changed transcripts upon GTPP 
of replicates for DMSO-, GTPP- and CDDO-treated samples with or CDDO treatment (Fig. 1c) compared with the number of transcripts 
R values depicting correlation value; log; -transformed FPKM values changed upon 17AAG previously reported'®. 


(>0) are plotted. c, Quantitative PCR monitoring induction of ™UPR 
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Extended Data Figure 4 | Promoter analysis of UPR™-induced 
transcripts encoding mitochondrial proteins. Analysis of UPR™-induced 
(GTPP and CDDO) transcripts encoding mitochondrial proteins for the 
occurrence of CHOP, MURE1, MURE2, or ATF4 promoter elements. 

We used FIMA version 4.11.1 with the consensus sequences shown. Cells 
marked in green represent the presence of the consensus sequence in the 
gene shown. 
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Extended Data Figure 5 | Changes in the mitochondrial proteome 


upon UPR™ induction. a, Assay design. b, Summary of proteomic data. 


c, Analysis of changes in the average abundance of mitochondrial 
ribosome (left) or for individual ribosomal subunits (right). Values 
are mean values + s.d. of scaled signal to noise values (that is, relative 
abundance) derived from the quantitative proteomics (Fig. 2) for 
identified mitochondrial ribosomal subunits (right, n = 2 biological 


replicates) and the average of all these values + s.d. (left); repl., replicate. 
d, Analysis of the abundance of the different mitochondrial electron 
transport chain complexes and ATP synthase. Values are derived from 
quantitative proteomics (Fig. 2) and shown as mean values + s.d. across 
all quantified subunits (top left) or separately per subunit for the different 
complexes (n = 2 biological replicates). All data depict scaled signal to 
noise values (that is, relative abundance). 
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Extended Data Figure 6 | Mitochondrial pre-RNA processing defects mitochondrial pre-RNA at tRNA™M* and tRNA’ RNaseP processing sites 
upon UPR™. a, Primer design for monitoring pre-RNA processing. upon depletion of MRPP3 by RNAi. Error bars, +s.d. (n =3 biological 
Primer pairs 1 and 3, and 2 and 4, will only produce PCR products for replicates). d, Quantitative PCR monitoring levels of non-processed 
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processed pre-RNAs. Primer pair 2 and 3 will monitor total levels for biological duplicate; repl., replicate e, LON protein levels as determined by 
normalization. b, Quantitative PCR of MRPP3 mRNA levels upon quantitative proteomics (Fig. 2) in biological duplicate. Shown are scaled 
knockdown with siRNA targeting a scrambled sequence or MRPP3. signal to noise values observed (that is, relative abundance). f. Western 
Shown are averages + s.d. (n = 3 biological replicates). c, qPCR of blot analysis of LON levels upon control or 10;1M GTPP treatment (6h). 
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Extended Data Figure 7 | Rescue of UPR™-induced mitochondrial pre- 
RNA processing by MRPP3 overexpression. a, Western Blot analysis of 
MRPP3 levels upon DMSO or GTPP treatment in the context of wild-type 


or MRPP3-overexpressing (0/e) cells. b, Quantitative PCR analysis of 


non-processed mitochondrial pre-RNA levels at the tRNAM* and tRNAs 


cut sites in wild-type cells or cells overexpressing MRPP3. Shown are mean 


values 4 


t s.d. (n=3 biological replicates). c, Quantitative PCR analysis of 


non-processed mitochondrial pre-RNA levels at the tRNAM* and tRNA)’ 
cut sites in wild-type cells or cells overexpressing MRPP3 upon GTPP 


treatment. Shown are mean values 4 
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Extended Data Figure 8 | Mitochondrial translation defects upon 
UPR™. a, Coomassie gel staining as a loading control of the same 
experiment as in Fig. 4b. b, Analysis of cytosolic translation upon 
treatment with DMSO or GTPP with the same experimental procedure 
as in Fig. 4a without the addition of emetine. Newly synthesized proteins 
were monitored by phospho-imager (left) with Coomassie staining of the 
same gel as loading control (right). c, Table of experiment number, mass 


spectrometer used, analysis method, peptides sequence, protein encoded 
and heavy-to-light ratios (H/L) used to determine protein synthesis rates 
in Fig. 4d. Fusion and QE are Orbitrap Fusion or Q Exactive (Thermo 
Scientific), respectively; Core depicts in-house mass spectrometry 
analysis pipeline; #, oxidative modification on methionine; *, could not be 
determined by Core/Maxquant. 
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Extended Data Figure 9 | Proteomic determination of mitochondrial translation upon UPR™. SILAC spectra for the data shown in Fig. 4d and 
Extended Data Fig. 9c for experiment 1. 
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*P<0.05, **P< 0.01, ***P < 0.001, n=3 biological replicates. c, Analysis 
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H4K20me0 marks post-replicative chromatin and 
recruits the TONSL-MMS22L DNA repair complex 


Giulia Saredi!*, Hongda Huang?*, Colin M. Hammond!, Constance Alabert!, Simon Bekker-Jensen’, Ignasi Forne’, 
Nazaret Rever6n-Gomez!, Benjamin M. Foster°, Lucie Mlejnkova®, Till Bartke°, Petr Cejka°, Niels Mailand?, Axel Imhof‘, 


Dinshaw J. Patel? & Anja Groth! 


After DNA replication, chromosomal processes including 
DNA repair and transcription take place in the context of sister 
chromatids. While cell cycle regulation can guide these processes 
globally, mechanisms to distinguish pre- and post-replicative 
states locally remain unknown. Here we reveal that new histones 
incorporated during DNA replication provide a signature of post- 
replicative chromatin, read by the human TONSL-MMS22L! 4 
homologous recombination complex. We identify the TONSL 
ankyrin repeat domain (ARD) as a reader of histone H4 tails 
unmethylated at K20 (H4K20me0), which are specific to new 
histones incorporated during DNA replication and mark post- 
replicative chromatin until the G2/M phase of the cell cycle. 
Accordingly, TONSL-MMS822L binds new histones H3-H4 both 
before and after incorporation into nucleosomes, remaining on 
replicated chromatin until late G2/M. H4K20me0 recognition 
is required for TONSL-MMS22L binding to chromatin and 
accumulation at challenged replication forks and DNA lesions. 
Consequently, TONSL ARD mutants are toxic, compromising 
genome stability, cell viability and resistance to replication stress. 
Together, these data reveal a histone-reader-based mechanism 
for recognizing the post-replicative state, offering a new angle 
to understand DNA repair with the potential for targeted cancer 
therapy. 

The TONSL-MMS22L complex is an obligate heterodimer required 
for replication fork stability and repair of replication-associated DNA 
damage by aiding RAD51 loading’~*. TONSL-MMS22L associ- 
ates with soluble non-nucleosomal histones H3-H4 (refs 1, 5), the 
histone chaperone ASF1 (refs 1-4) and MCM2/4/6/7 (refs 1-5) in 
a manner that depends on the TONSL ARD!. We have found that 
histones H3-H4 bridge the interactions between TONSL-MMS22L 
and ASFI (ref. 1), between ASF1 and MCM2 (refs 6, 7), and between 
TONSL-MMS22L and MCM2 (Extended Data Fig. 1a), suggesting 
simultaneous binding of these proteins to histones H3-H4 in a large 
pre-deposition complex. In addition, TONSL-MMS22L interacts with 
nucleosomal histones in chromatin (Extended Data Fig. 1b). This sug- 
gests that TONSL-MMS22L functions as an H3-H4 histone chap- 
erone while also acting as a histone reader in chromatin. Consistent 
with a chaperone function, TONSL was recently shown to have his- 
tone chaperone activity in vitro’. We therefore set out to explore the 
mechanism of action of TONSL-MMS22L by a structure—function 
approach. 

Full-length TONSL and the ARD alone bound directly to histones 
H3-H4 but not H2A-H2B in vitro (Extended Data Fig. 1c-f). As our 
attempts to crystallize the ARD with H3-H4 were not successful, we 
linked the ARD to the MCM2 histone-binding domain (HBD), because 


a similar design previously enabled us to solve the structure of an H3- 
H4 dimer in complex with MCM2 and ASF1 (ref. 7). We obtained 
crystals of covalently linked MCM2 HBD-G,-TONSL ARD in com- 
plex with H3 (57-135) and H4 that diffracted to 2.43 A resolution, and 
solved the structure by molecular replacement on the basis of our struc- 
ture of MCM2 HBD in complex with an H3-H4 tetramer’ (Fig. la, b; 
for X-ray statistics, see Extended Data Table 1). The structure shows a 
pair of MCM2 HBDs wrapped around the lateral surface of the H3-H4 
tetramer, similar to the MCM2-HBD-H3-H4 complex alone”*®, while 
two TONSL ARDs interact with each of the H4 tails (Fig. 1a, b). The 
G,-linker along with flanking residues formed a 19-residue-long 
disordered segment that could reach a distance of up to 70 A. The 
distance from the observed C terminus of MCM2 HBD to the observed 
N terminus of TONSL ARD is only 10 A, indicating that the covalent 
linkage within the MCM2-HBD-G,4-TONSL-ARD cassette does not 
affect the structural integrity of the complex. TONSL ARD forms no 
intermolecular interactions with the MCM2 HBD, consistent with 
H3-H4 bridging the interaction of TONSL and MCM2 in cells 
(Extended Data Fig. 1a), and it shows only minimal contacts with the 
core of the H3-H4 tetramer (Fig. 1a, b). However, the TONSL ARD 
forms extensive contacts with a segment of the H4 tail (Fig. 1b, c and 
Extended Data Fig. 1g) and, consistently, it binds the histone H4 tail, 
but not the H3 tail, in vitro (Extended Data Fig. 2a). In addition to 
defining TONSL binding to soluble histone H3-H4 in complex with 
MCM2 (Fig. la, b), this binding mode is also compatible with TONSL 
binding histone H3-H4 dimers in a co-chaperone complex with MCM2 
and ASFI (ref. 7) as well as recognizing H4 tails in a nucleosome (see 
models in Extended Data Fig. 2b, c). 

The TONSL ARD consists of four ankyrin repeats, three of which 
adopt the canonical fold (ANK1-3), while the remaining one is an 
atypical and capping repeat (ANK4) (Extended Data Fig. 1g). The 
TONSL ARD uses its elongated concave surface to form extensive 
contacts with the H4 tail in an extended 3-strand-like conformation 
(Fig. 1c and Extended Data Fig. 1g). Notably, 15 out of 18 residues 
that constitute the H4-tail-binding surface of TONSL ARD are highly 
conserved (Extended Data Fig. 2d). The TONSL ARD targets the H4 
tail, spanning residues Lys12 to Arg23 (Fig. 1c-g and Extended Data 
Fig. 3a, b) with three consecutive binding channels accommodating 
Arg17, His18 and Lys20 (Fig. 1d). These H4 residues are part of a basic 
region, which can interact with the acidic patch on neighbour nucle- 
osomes’ in compact chromatin. H4 Arg17 forms two hydrogen bonds 
with ARD Asn571 and stacks with Tyr572 and Cys608 (Fig. Ic, e), 
while H4 His18 penetrates into a pocket lined by four strictly conserved 
residues (Trp563, Glu568, Asn571 and Asp604) (Fig. 1c, f). Substitution 
of H4 His18 with the larger Trp residue (mutant H18W) disrupts 
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Figure 1 | TONSL ARD interacts with 

the histone H4 tail. a, b, Two different 
representative views of the overall structure 
of the TONSL-ARD-MCM2-HBD-H3- 
H4 tetramer complex. c, Intermolecular 
interactions between TONSL ARD and 

the H4 tail. d, The electrostatic potential 
surface of ARD showing the acidic concave 
surface binding site for the H4 tail. 

e-g, Highlights of the intermolecular 
interactions of H4 Arg17, His18 and Lys20 
with ARD. h, Immunoprecipitation (IP) of 
soluble haemagglutinin (HA)-SNAP-H4 wild 
type (WT) or mutant transfected into green 
fluorescent protein (GFP)-TONSL U-2-OS 
cells. i, ITC of TONSL ARD wild type 

and mutants with H4 tail peptide. 

j, Immunoprecipitation of soluble 
GFP-TONSL wild type or mutant. 

h, j, Data are representative of three 
independent experiments. For protein 
inputs, see Extended Data Fig. 9b, c; for gel 
source data, see Supplementary Fig. 1. 
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binding with ARD (Fig. 2a), underscoring the importance of fitting 
His18 in the pocket. The H4 Lys20 residue is bound within an acidic 
surface channel on ARD (Fig. 1c, d). The side chain of H4 Lys20 inter- 
acts with Met528 and contacts the edge of Trp563 of ARD, while the 
main-chain atoms of H4 Lys20 packs against Cys561 of ARD (Fig. 1g). 
The NC atom of H4 Lys20 forms three strong hydrogen bonds 
(distance <3 A) with the side chains of strictly conserved residues 
Glu530, Asp559 and Glu568 of ARD, which engage H4 Lys20 in a trian- 
gular arrangement (Fig. 1g). Consistent with the structural data, histone 


d 


H4 mutations R17A, H18A and K20A disrupted binding to TONSL in 
cells (Fig. 1h). Likewise, mutation of six conserved TONSL residues lin- 
ing the H4 Arg17, His18 and Lys20 binding channels disrupted binding 
to H4 peptides and recombinant histone H3-H4 (Fig. li and Extended 
Data Fig. 3c). In vivo, these mutants abrogated binding to soluble 
histone H3-H4 and, consequentially, also association with ASFla and 
ASF1b and MCM2 without affecting MMS22L binding to TONSL!” 
(Fig. 1j and Extended Data Fig. 3d). These mutations did not affect ARD 
structure, as indicated by circular dichroism (Extended Data Fig. 3e). 
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Figure 3 | H4K20me0 is a signature of post-replicative chromatin, read 
by TONSL ARD. a, H4K20me on new and old histones analysed by stable 
isotope labelling with amino acids in cell culture (SILAC)-based mass 
spectrometry of chromatin pulse-labelled with biotin-dUTP (b-dUTP) and 
isolated by NCC (data from ref. 18). noco, nocodazole. Error bars indicate 
s.d.; 1=9 (S), 3 (S/G2, M), 5 (G1); M (old histones) shows the mean of 
n= 2; see also Extended Data Fig. 6a, b. b, TONSL chromatin binding in 
pre-extracted TIG3 fibroblasts shown as a function of 4’,6-diamidino-2- 
phenylindole (DAPI) intensity or cell cycle stage. Error bars indicate s.d.; 
n= 886 (G1), 2194 (S) and 756 (G2); representative cells are shown. 

c, d, Colocalization analysis of chromatin-bound GFP-TONSL with 
MCM2 (c) and EdU (d). d, Cells were pulsed with EdU (left) or released 
into S phase in the presence of EdU (right) and analysed by deconvolution 
microscopy. Error bars indicate s.d.; c, n= 13; d, from left, n=9, 16, 10, 

9, 27, 36; representative cells are shown. Scale bar, 5,1m. e, Chromatin 
binding of GFP-TONSL analysed by cellular fractionation, quantified 
relative to total GFP-TONSL and normalized to wild type (WT). Error 
bars indicate s.d.; n= 5 (wild type/N571A); n= 3 (E568A). Unpaired t-test: 
****P < 0.0001. f, Chromatin binding of GFP-TONSL analysed as in b. 
Error bars indicate s.d.; from left, n = 1,302, 3,567, 750, 1,311, 3,850, 838, 
1,495, 3,221, 832, 1,695, 2,729, 877. Data are representative of two (b-d, f) 
independent experiments. 


Together, this defines TONSL ARD as a recognition module for histone 
H4 tails, distinct from the GLP/G9A ARDs that bind histone H3 tails 
mono- or dimethylated at K9 (Extended Data Fig. 4a, b)'®. 

The structure predicts that methylation on H4K20 should break 
critical hydrogen bonds with the TONSL ARD. Isothermal titra- 
tion calorimetry (ITC) and H4-tail peptide pull-downs confirmed 
that H4K20mel/2 is incompatible with TONSL binding (Fig. 2a—c). 
Furthermore, H4K20me?2 significantly reduced binding of full-length 
recombinant TONSL-MMS822L to reconstituted mononucleosomes 
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(Fig. 2d). Recently, TONSL ARD with its neighbouring acidic stretch 
was proposed to bind H3K9mel (ref. 5), but we were unable to detect 
an interaction between TONSL ARD (with or without the acidic 
stretch) and H3K9mel1 peptides (Extended Data Fig. 4c, d). Together, 
our data show that TONSL binds to both free histones and nucleosomes 
via ARD recognition of H4 tails unmodified at K20 (Figs la-j, 2a—-e 
and Extended Data Figs 1b, 2a—c, 3a—d). In line with this, H4K20me2 
was not detected on TONSL-bound nucleosomal histones (Fig. 2f), 
while H4K 16ac was present (Extended Data Fig. 5a). H4K16ac stimu- 
lated TONSL binding in peptide pull-downs (Fig. 2b, c and Extended 
Data Fig. 5b) and slightly enhanced ARD binding in ITC (Fig. 2a), 
but it did not overturn the inhibitory effect of H4K20me2 (Fig. 2e). 
However, H4K16ac is not essential for TONSL binding in vivo, as 
soluble histone H4 does not carry H4K1l6ac!' and depletion of 
MOB, the major H4K16 acetyltransferase'’, did not significantly 
affect TONSL binding to chromatin (Extended Data Fig. 5c, d). In 
contrast, depletion of the H4K20 methyltransferase SET8 (also known 
as PR-SET7) significantly enhanced TONSL binding to chromatin in 
G1 cells in which H4K20me?2 peaks!*'* (Fig. 2g and Extended Data 
Fig. 5e, f). We conclude that TONSL ARD is a histone-reader domain 
specific for H4 tails unmethylated at K20. 

Given that TONSL-MMS22L binds new histones (devoid of 
H4K20me!"”) in a pre-deposition complex with ASF1 and MCM2 
(Fig. 1j and Extended Data Fig. 1a)!, TONSL-MMS22L could be loaded 
onto replicating DNA together with new histones. To test how long 
after deposition new histones remain unmethylated at H4K20 with the 
potential to bind TONSL, we extracted H4K20 data from our recent 
large-scale proteomic study’, tracking modifications on new and old 
recycled histones by nascent chromatin capture (NCC)’? (Fig. 3a and 
Extended Data Fig. 6a, b). In nascent chromatin, new histones were 
exclusively unmethylated at H4K20 (98% H4K20me0), while old recy- 
cled histones were almost fully methylated at H4K20 (mel, 7%; me2, 
88%; me3, 2%). Consistent with previous work!?-!°, our analysis of 
primary cells (Extended Data Fig. 6c) and degradation of SET8 in S 
phase! new histones became methylated in late G2/M, rendering G1 
chromatin devoid of H4K20me0 (Fig. 3a). This identifies H4K20me0 
on new histones as a signature of post-replicative chromatin, imply- 
ing that TONSL-MMS22L can bind H4 tails on new histones at rep- 
lication forks and sister chromatids until late G2/M. Confirming this 
prediction, TONSL accumulated on chromatin in S phase, remained 
chromatin-bound in a population of G2 cells, and was excluded from 
chromatin in G1 (Fig. 3b and Extended Data Fig. 6d-f). To discrimi- 
nate pre- and post-replicative chromatin, we labelled replicating DNA 
with 5-ethynyldeoxyuridine (EdU; pulse to mark ongoing replication, 
continuous labelling to identify post-replicative chromatin) and stained 
pre-replicative chromatin with MCM2 (refs 20, 21), and analysed colo- 
calization with TONSL. TONSL staining was mutually exclusive with 
MCM2 (Fig. 3c and Extended Data Fig. 7a), but colocalized with EdU 
pulse labelling in very early S phase and with replicated DNA (contin- 
uous EdU labelling) throughout S phase (Fig. 3d and Extended Data 
Fig. 7b, c). TONSL was present at sites of ongoing DNA replication 
throughout S phase, but the degree of colocalization declined in mid/ 
late S phase (Fig. 3d, left), consistent with TONSL binding to post- 
replicative chromatin also after fork passage (Fig. 3d, right). Mutation 
of the TONSL ARD abrogated recruitment of TONSL to chromatin, 
including DNA replication sites (Fig. 3e, f and Extended Data 
Fig. 7d-g). Together, these data demonstrate that TONSL is recruited to 
replication forks and post-replicative chromatin via ARD recognition 
of H4K20me0 on new histones. 

Mutation of the TONSL ARD also abrogated chromatin binding 
and recruitment to replication forks in the presence of replication 
poisons such as camptothecin (CPT) and hydroxyurea (Fig. 4a—-c). 
Furthermore, ARD mutation prevented accumulation of TONSL at 
site-specific double-strand breaks (DSBs; Fig. 4d and Extended Data 
Fig. 8a) and microlaser-generated DNA damage (Fig. 4e and Extended 
Data Fig. 8b, c). Co-staining with cell cycle markers confirmed that 
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Figure 4 | H4K20me0 recognition is required for TONSL accumulation 
at DNA repair sites and genome stability. a, Chromatin-binding of 
GFP-TONSL in CPT-treated S-phase cells. WT, wild type. Error bars 
indicate s.d.; from left, n = 1,461, 2,631, 1,245, 1,764, 2,116, 3,178. 

b, Co-immunoprecipitation (IP) of TONSL-MMS22L with Flag-HA-MCM2 
wild type or histone-binding mutant (Y81A, Y90A)’ from chromatin 
after hydroxyurea (HU) treatment. c, NCC analysis of GFP-TONSL 
recruitment to replication forks in CPT-treated cells. Minus sign indicates 
no b-dUTP. d, Chromatin immunoprecipitation (ChIP) and quantitative 
polymerase chain reaction (qPCR) analysis of GFP-TONSL recruitment 
to site-specific DSBs induced by AsiSI°’. See Extended Data Fig. 8a for 
additional controls. e, GFP-TONSL recruitment to laser-induced DNA 
lesions (error bars indicate s.d.; n = 3; total cells counted, 210 (wild type) 


TONSL is recruited to DNA repair sites only in S and G2 cells, as 
expected? (Fig. 4e and Extended Data Fig. 8d, e). We conclude that 
H4K20me0 binding is required for TONSL accumulation at damaged 
forks and DNA lesions in post-replicative chromatin. However, this was 
not due to increased H4K20me0 (Extended Data Fig. 8f), suggesting 
that unmasking of H4 tails upon chromatin decompaction””” and/or 
interaction with repair factors contribute to TONSL-MMS22L accu- 
mulation at repair sites. Consistent with an auxiliary mode of recruit- 
ment, MMS22L interaction with RAD51 can stabilize the complex at 
challenged forks (P. Cejka and M. Peter, personal communication). Our 
data suggest that this is subsequent to H4K20me0 binding (Fig. 4a-e), 
and we thus next addressed the contribution of H4K20me0 recognition 
to TONSL-MMS22L function. In complementation analysis, TONSL 
wild type partially rescued the viability of TONSL-depleted cells in the 
presence and absence of CPT (Fig. 4f and Extended Data Fig. 8g, h), 
whereas TONSL ARD mutants were toxic (Fig. 4f and Extended 
Data Fig. 8g, h). In control cells, TONSL ARD mutants also reduced 
viability, causing G2/M arrest accompanied by replication-associated 
DNA damage (Fig. 4g, h). Furthermore, TONSL ARD mutants titrated 
MMS22L away from chromatin (Fig. 4i and Extended Data Fig. 8i), 
explaining the dominant-negative phenotype that mimics TONSL- 
MMS22L depletion’. Collectively, this indicates that recognition of 


and 252 (N571)). f, g, Colony formation upon GFP-TONSL induction 
by tetracycline (-++tet) in siRNA- and CPT-treated cells. h, Cell cycle and 
53BP1 foci analysed by microscopy. Left, percentage of G2/M cells shown 
relative to non-induced cells (—tet). Error bars indicate s.d., n = 4 (left), 

5 (right). i, Chromatin-bound MMS22L analysed as in Fig. 3e. Mean 
with individual data points are shown (n = 3 (untreated), 2 (CPT)), see 
Extended Data Fig. 8i for western blots. j, TONSL-MMS22L identifies 
post-replicative chromatin by binding H4K20me0 on new histones, 
directing TONSL-MMS22L genome surveillance function to DNA having 
a sister chromatid. Data are representative of three (a), two (b-d, f, right, g), 
and four (f, left) independent experiments. For protein inputs, see 
Extended Data Fig. 9e, f. 


H4K20me0 is central to TONSL-MMS22L function in safeguarding 
genome stability. 

This study reveals that post-replicative chromatin has a distinct his- 
tone modification signature, read by the TONSL-MMS22L effector 
protein (Fig. 4j). This opens a new avenue to understand how DNA 
repair and other chromosomal transactions can be directly linked to 
the replication state of a genomic locus. Intriguingly, it is the new his- 
tones that make post-replicative chromatin distinct, and in this way 
H4K20me0 resembles the behaviour of H3K56ac” in yeast. Our data 
indicate that TONSL-MMS22L is delivered to nascent chromatin with 
new histones via the pre-deposition complex with MCM2 and ASF1 
(Fig. 4j). We favour the idea that TONSL has a dual function as a his- 
tone chaperone’ and histone reader. Our structural work proposes that 
TONSL acts in a histone chaperone-like capacity by sequestering the 
H4 tail to prevent spurious contacts with DNA during H3-H4 depo- 
sition. Furthermore, TONSL ARD may counteract chromatin com- 
paction by preventing association of the H4 tail with the H2A-H2B 
acidic patch on neighbouring nucleosomes. Thus, TONSL changes 
our perception of a histone chaperone by binding both soluble and 
nucleosomal histones. In its function as a histone reader, TONSL 
localizes MMS22L to post-replicative chromatin via H4K20me0 and 
allows TONSL-MMS22L to accumulate at damaged forks and DNA 
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lesions. We envision that H4K20me0 works as an affinity trap, making 
TONSL-MMS822L readily available to support RAD51 loading dur- 
ing homologous recombination. This provides a new approach and 
opportunity to understand the role of H4K20 in DNA repair, comple- 
menting the well-described role of H4K20me1/2 in recruiting 53BP1 
to promote non-homologous end joining in competition with BRCA1- 
BARDI1 (refs 24, 25). In post-replicative chromatin, H4K20me1/2 on 
old histones will support 53BP1 recruitment. Whether H4K20me0 
on new histones also influences DNA repair pathway choice will be 
of interest in future investigations. It is notable that the structure of 
the TONSL ARD, including the histone-binding surface, is highly 
similar to the ARD of BARD1 (Extended Data Fig. 9a)”°, required for 
BRCA1 tumour suppressor function and homologous recombination’. 
Multiple mutations in the TONSL ARD are reported in cancer (C608G, 
P557S, E597K; http://cancer.sanger.ac.uk) and the N571 residue, key 
to histone H4 binding, corresponds to the BARD1 N470S cancer 
mutation”®”’, This highlights the tumour suppressor function of 
H4K20me0 recognition, and the possibilities it brings for targeted 
cancer therapy should be explored in the future. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. No statistical methods were used to prede- 
termine sample size. The experiments were not randomized. The investigators were 
not blinded to allocation during experiments and outcome assessment. All proteins 
used in this study, unless otherwise indicated, were expressed in BL21(DE3)-RIL 
cell strain (Stratagene). The human TONSL ARD (residues 512-692) and MCM2 
HBD (fragments 61-130) were covalently linked through a four-glycine linker 
(Gy, linker) into one expression cassette. The MCM2 HBD-G,-TONSL ARD 
expression cassette was cloned into a modified RSFDuet-1 vector (Novagen), 
with an N-terminal Hisg-SUMO tag. The resulting plasmid was co-expressed with 
plasmid harbouring histone genes H3.3(A56) and H4. The expressed protein 
complex was first purified on an Ni-NTA affinity column. After removing the 
Hisg-SUMO tag by using Ulp1 (SUMO protease), the protein complex was further 
purified on HiLoad 16/600 Superdex 200 column (GE Healthcare). 

The GST-tagged TONSL ARD and its mutants including E530A, D559A, 
W563A, E568A, N571A and D604A were cloned into pGEX-6P-1 vector (GE 
Healthcare). The expressed proteins were first purified using Glutathione 
Sepharose 4B, then further purified by gel-filtration step. In some case, the GST 
tag was removed with 3C protease before the gel-filtration step. For purification of 
GST-H3 tail and GST-H4 tail proteins, the human histones H3 fragment 1-59 and 
H4 fragment 1-31 were cloned into pGEX-6P-1 vector respectively. The proteins 
were expressed and purified in the same way. 

For production of recombinant full-length TONSL-MMS22L heterodimer, 
the sequence coding for full-length MMS22L was fused with an MBP tag at the 
5’ end and 10x His tag at the 3’ end. The sequence coding for full-length TONSL 
was fused with a GST tag at the 5’ end. Both MMS22L and TONSL constructs 
were cloned into a pFastBacl vector. The complex was expressed in Sf9 cells by 
co-infection with both recombinant baculoviruses according to manufacturer's 
recommendation (Invitrogen). The proteins were extracted from Sf9 cells and puri- 
fied similarly as described previously for Sgs1 (ref. 30). Briefly, the complex was 
purified on amylose resin, and MBP and GST tags were subsequently cleaved with 
PreScission protease. The heterodimer was then further purified using a Ni-NTA 
affinity resin. Washes were performed with 300 mM NaCl buffer. 
Crystallization. At first, we tried to crystallize TONSL ARD in complex with a H4 
tail or H3-H4 tetramer, but failed even with extensive screening. An additional 
binding protein may help to stabilize the whole complex and help crystallization. 
Then we tried to crystallize TONSL ARD in complex with the MCM2 HBD and 
H3-H4 tetramer. We just got very tiny crystals for this complex, but failed to get 
big and well-diffracted crystals. We realized that the whole complex of TONSL 
ARD with MCM2 HBD and H3-H4 tetramer might be destabilized by the harsh 
crystallization conditions and form a subcomplex, thus hindering the optimization 
of the crystals. Then we tried to covalently link TONSL ARD and MCM2 HBD into 
one cassette through different length of glycine linker (G,, linker). The Gy, Gy, 
Gyo, Go, Gg, G7, Ge, Gs and Gy linkers had been tried and all these cassettes could 
be crystallized. One of the constructs with a Gy linker gave well-diffracted crystals. 

The G4 linker complex, MCM2-HBD-G4-TONSL-ARD cassette-H3.3(A56)— 
H4 complex (herein denoted as TONSL-ARD-MCM2-HBD-H3-H4 tetramer 
complex) at a concentration of 23 mg ml"! was crystallized in 0.1 M MES pH 5.6, 
7% isopropanol using sitting-drop vapour-diffusion method at 20°C. All the crys- 
tals were soaked in a cryoprotectant made from mother liquor supplemented with 
25% glycerol before flash freezing in liquid nitrogen. 

Structure determination. The data sets for the TONSL-ARD-MCM2-HBD-H3- 
H4 tetramer complex were collected at 0.979 A on 24-ID-C/E NE-CAT (Advanced 
Photo Source, Argonne National Laboratory). All the data sets were processed 
by using the HKL 2000 program. The initial structure for the TONSL-ARD- 
MCM2-HBD-H3-H4 tetramer complex was solved by molecular replacement 
in PHASER*! with our previous structure of the MCM2-HBD-H3-H4 tetramer 
complex’ as a search model and manually refined and built using Coot*”. The final 
structure of this complex was refined to 2.43 A resolution using PHENIX**. The 
Ramachandran plot showed 95.9% favoured and 4.1% allowed. Extended Data 
Table 1 summarizes the statistics for data collection and structural refinement. 

Preparation of recombinant modified mononucleosomes. Recombinant 
human histone proteins were expressed in Escherichia coli BL21(DE3)-RIL cells 
from pET21b(+) (Novagen) vectors and purified by denaturing gel filtration and 
ion-exchange chromatography essentially as described™. All histone proteins were 
dialysed into water containing 1 mM dithiothreitol (DTT), lyophilized and stored 
dry at —80°C. Modified H4 proteins were generated by native chemical ligation 
essentially as described for H3 (ref. 35). Briefly, tail-less H4 A1-28 129C protein 
was expressed in E. coli BL21(DE3)-RIL cells from pET24b(+-) (Novagen) and 
purified by denaturing gel filtration and reversed-phase chromatography using 
a ResourceRPC column (GE Healthcare). Purified H4 A1-28 129C was then 
ligated to N-terminally acetylated H4 1-28 thioester peptides (Almac) and full- 
length ligated H4 was separated from unligated H4 A1-28 129C by reversed-phase 
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chromatography via a C18 column (Aquapore RP-300/Perkin Elmer) using a gra- 
dient from 35% B to 45% B over 20 column volumes (A: 0.1% TFA in water; B: 
90% acetonitrile; 0.1% TFA). Ligated H4 was directly lyophilized and stored dry 
at —80°C. Ligated H4 was refolded into octamers together with purified histones 
H2A, H2B and H3.1 and then assembled into nucleosomes with biotinylated 601- 
DNA as described**"°. 

GST pull-downs. For pull-downs of GST-ARD and its mutants including E530A, 
D559A, W563A, E568A, N571A and D604A with H3-H4, first 25 1l of Glutathione 
Sepharose 4B beads were suspended with 200 1] of binding buffer (20 mM Tris 
pH 7.5 and 0.5 M NaCl), and 1 nmol of GST-ARD proteins were added and incu- 
bated at 23°C for 10 min; then 0.5 nmol of pre-purified H3/H4 tetramers were 
added and incubated for another 1h; then the beads were washed quickly with 
five times 1 ml of washing buffer (binding buffer, 1% Triton X-100) before adding 
50 il of sample loading buffer. An aliquot of 2011 of each sample was analysed with 
SDS-PAGE. The GST pull-downs of histone tails of GST-H3}_s9 and GST-H4-3 
with TONSL ARD were performed similarly. 

Circular dichroism. Circular dichroism spectra were acquired using a Jasco J-815 
Circular Dichroism Spectropolarimeter with a 1 mm quartz cuvette. Spectra were 
recorded for wild-type and mutant TONSL ARD (512-692, 6.25 1M) between 
260 nm and 195 nm in KH2P04/K2HPO4 buffer (25 mM, pH 7.8) with a data 
pitch of 0.5 nm, bandwidth of 1 nm and with three accumulations at a scanning 
speed of 50nmmin™!. 

In vitro translation and pull-downs with H3-H4 sepharose beads. NHS- 
activated sepharose 4 Fast-Flow beads (GE Healthcare) were washed with 0.1 M 
HCl and incubated overnight with 1 \1M recombinant histone H3.1—-H4 tetramers 
(New England Biolabs, catalogue number M2509S) or 1 1M recombinant histone 
H2A-H2B dimers (New England Biolabs, catalogue number M2508S) in Coupling 
Buffer (0.2 M NaHCOs, 0.2 M NaCl). One microgram of pSC-B-TONSL, pEXPR- 
IBA-105-ASF1A wild-type and pEXPR-IBA-105-ASF1A V94R plasmids was 
incubated with TnT Quick Coupled Transcription/Translation System (Promega) 
and *°S-methionine according to the manufacturer’s instructions. Ten microlitres 
of in vitro translation (IVT) mixture were added to the H3.1-H4, or H2A-H2B, 
sepharose beads and incubated for 2h. Beads were washed with 200 mM NaCl, 
0.2% NP40 buffer. Beads were boiled in 1x LSB and loaded on a 4-12% Bis-Tris 
NuPage gel (LifeTechnologies). Proteins were transferred to a 0.2|1m nitrocellulose 
membrane by overnight wet transfer at 20 V and the membrane was incubated 
in an autoradiography cassette for 24h before detection by Phosphor Imager 
(PerkinElmer). 

ITC experiments. All the ITC titrations were performed on a Microcal ITC 200 
calorimeter at 25°C or 20°C. The peptides of H4 (residues 9-25) and its modi- 
fied peptides K16ac (with acetylation on Lys16), H18W (with His18 mutated to 
Trp18), H4K20mel (monomethylation on Lys20) and H4K20me2 (dimethylation 
on Lys20), and peptide of H3(1-21)K9mel (monomethylation on Lys9) were all 
synthesized at Tufts University Core Facility The exothermic heat of the reac- 
tion was measured by 17 sequential 2.2 ,1l injections of the peptides (1.41 mM in 
buffer 20 mM Tris pH 7.5 and 0.5 M NaCl) into 200 il of the TONSL ARD solution 
(145M in the same buffer), spaced at intervals of 150s or 180s. The data were 
processed with Microcal Origin software and the curves were fit to a single site 
binding model. 

Peptide pull-downs assays. Purified recombinant TONSL ARD (residues 512- 
692) was stored at 400|1.M in 1 M NaCl, 20 mM Tris HCl pH 7.5 at —80°C. For 
each pull-down, 400 pmol of the ARD stock (1 11, 400 1M) was diluted with 99 il of 
binding buffer (150 mM NaCl, 50 mM Tris HCl pH 7.5, 5% glycerol, 0.25% NP-40, 
0.2mM EDTA, 0.5mM DTT, 0.2mM PMSF, 1 mM leupeptin, 1 mM pepstatin). 
ARD input material was scaled to the number of pull-downs performed. For each 
pull-down, an H4 peptide (JPT Peptide Technologies GmbH) spanning residues 
14-33 (2.5 11, 250\1M) with a C-terminal biotinoyl-lysine residue or, as control, 
biotin (2.511, 400 1M) was added to 1.1 ml of binding buffer in addition to 10011 
of the ARD input material and the mixture was incubated overnight rotating at 
4°C. The next day, 2511 of MyOne Streptavidin C1 beads (Life Technologies) was 
washed in binding buffer (3 x 50011) for each pull-down, removing the final wash 
from the beads. The ARD/peptide or ARD/biotin mixture was added to an aliquot 
of pre-washed MyOne Streptavidin C1 beads and incubated with rotation at 4°C 
for 3h. Finally, the beads were washed (2 x 300 il and 1x 20011 of 300 mM NaCl, 
50mM Tris HCl pH 7.5, 5% glycerol, 0.25% NP-40, 0.2 mM EDTA, 0.5mM DTT, 
0.2mM PMSF, 1 mM leupeptin, 1 mM pepstatin) and pull-down material was 
visualized by Coomassie staining after SDS-PAGE separation of proteins on a 
NuPAGE 4-12% gel. 

For pull-downs from cell extracts, MyOne T1 beads were incubated overnight 
with 11g of biotinylated peptides in high salt (HS; 300 mM NaCl, 0.5% NP40, 
Tris HCl, EDTA, 5% glycerol) buffer and subsequently washed twice with PBS. 
One milligram of NP40/NaCl extract from HeLa $3 or GFP-TONSL U-2-OS cells 
was added to the beads and incubated for 2h rotating at 4°C. The beads were 
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then washed five times with HS buffer, 2 min rotating at 4°C. After washing, the 
beads were resuspended in 1 x LSB and boiled for 10 min. The eluted proteins were 
loaded on a 4—12% Bis-Tris NuPage gel (Life Technologies). Proteins were then 
transferred to a 0.2|1m nitrocellulose membrane by overnight wet transfer at 20 V 
and detected by western blotting. 
Nucleosome pull-down assay. Modified nucleosomes of H4K20me0 or 
H4K20me2 were prepared by peptide ligation and stored at 0.1 ,1g41”! (by histone 
octamer) in 100mM NaCl, 50 mM Tris HCl pH 7.5 at 4°C. Full-length TONSL- 
MMS22L complex was stored at 746 nM in 100mM NaCl, 50 mM Tris HCl, pH 7.5, 
5mM (-mercaptoethanol, 10% glycerol, 0.5 mM PMSF at —80°C. Nucleosome 
pull-downs were performed across two sets of conditions (n= 3 for each condi- 
tion) in the presence of herring sperm competitor DNA. Condition number 1: 
Nucleosomes (11g by histone octamer) or biotin (0.5 1g) were mixed with TONSL- 
MMS22L (1.9 pmol) and made up to 3011 with binding buffer (500 mM NaCl, 
50mM Tris HCl pH 7.5, 20% glycerol, 0.1% NP-40, 1 mM DTT, protease inhibitors 
and 10,.g ml“! herring sperm DNA (Sigma)). Inputs of 10,1 were taken before 
diluting each sample with binding buffer to a final volume of 300 il and incu- 
bating overnight at 4°C. Pull-downs were performed by adding 2011 of MyOne 
Streptavidin C1 beads prewashed and resuspended in 100 11 of binding buffer to 
each pull-down reaction, incubating at 4°C for 2h, washing with 5 x 500,11 binding 
buffer for 2 min at room temperature. Condition number 2: nucleosomes (0.5 1g by 
histone octamer) or biotin (0.5 j1g) were mixed with TONSL-MMS22L (1.3 pmol) 
and made up to 30 il with binding buffer (500 mM NaCl, 50 mM Tris HCl pH 7.5, 
5% glycerol, 0.5% NP-40, 0.2 mM EDTA, 1 mM DTT, protease inhibitors and 
101g ml! herring sperm DNA (Sigma)). Inputs of 1511 were taken before diluting 
each sample with binding buffer to a final volume of 50011 and incubating over- 
night at 4°C. Pull-downs were performed by adding 10,11 of MyOne Streptavidin 
T1 beads prewashed and resuspended in 1001] of binding buffer to each pull-down 
reaction, incubating at 4°C for 4h, washing with 5 x 500 1l binding buffer for 2min 
at room temperature. Pull-downs were visualized by SYPRO Ruby staining after 
SDS-PAGE separation of proteins on a NuPAGE 4—12% gel using an ImageQuant 
LAS 4000 (GE Healthcare). The intensity of stained bands were quantified using 
ImageJ, TONSL intensity was normalized to the combined intensity of H3, H2A 
and H2B. Statistical analysis was performed using data from the six independent 
experiments using the unpaired t-test with equal standard deviations in prism 6. 
Cell culture, transfection and drug treatment. U-2-OS (gift from J. Bartek), 
HeLa S3 (gift from P. Nakatami) and TIG-3 (gift from K. Hansen) cells were 
grown in DMEM (Gibco) containing 10% FBS (Hyclone) and 1% penicillin/ 
streptomycin and drugs for selection. The construct for siRNA-resistant GFP- 
TONSL was described? and ARD mutations were introduced in this construct by site- 
directed mutagenesis. The construct for pBABE-SNAP-HA-H4 plasmid was pre- 
viously described*° and H4 tail mutations were introduced in this construct by 
site-directed mutagenesis. Cells inducible for GFP-TONSL wild type and ARD 
mutants were generated in Flp-In T-Rex U-2-OS cells (Invitrogen) by transfection 
of pcDNA5/FRT/TO-GFP-TONSL plasmids with Lipofectamine 2000, accord- 
ing to the manufacturer’s protocol, and selection with hygromycin (200,.g mI). 
Previously described inducible GFP-TONSL U-2-OS cells! were used for 
Fig. 1h and Fig. 2c, e. U-2-OS Flag~-HA-MCM2 wild type and Y81A and Y90A 
cells were previously described’. pBABE-AsiSI-ER-HA” was introduced into 
inducible GFP-TONSL cell lines by lentiviral infection and puromycin selection. 
All cell lines were authenticated by western blotting and/or immunofluorescence. 
All cell lines used in this study tested negative for mycoplasma contamination. 
Expression of GEP-TONSL was induced by addition of 1 j,gml! of tetracycline 
for 24h. U-2-OS and TIG3 cells were synchronized by a single thymidine block 
(2mM) and released into S phase in the presence of 24\1M dCTP. For transient 
expression of GFP-TONSL or SNAP-HA-H4 (Fig. 1h, j), expression plasmids 
were introduced by transfection with Lipofectamine 2000 (Invitrogen) according 
to the manufacturer’s protocol and cells harvested 24h after transfection. siRNA 
transfection was performed with RNAiMax reagent (Invitrogen) according to the 
manufacturer's protocol. All siRNAs were used to a final concentration of 50nM. 
siRNA sequences (Sigma): siSET8#1: 5/-GUACGGAGCGCCAUGAAGU-3’; 
siSET8#2: 5’-ACUUCAUGGCGCUCCGUACUU-3’ (ref. 37); siMOF#1: 
5'-GUGAUCCAGUCUCGAGUGA-3’ (ref. 12); siMOF#2: 5’/-GAGAUCAACCA 
UGUGCAGA-3; si TONSL: 5’‘-GAGCUGGACUUAAGCAUGA-3’ (ref. 2). 
Drug treatment was as follows. CPT: cells were either treated with 1 \1M CPT for 
3h (Fig. 4a, iand Extended Data Fig. 8f) or 20 min (Fig. 4c), or with 50nM CPT 
for 24h (Fig. 4f). Hydroxyurea: cells were treated with 3 mM hydroxyurea for 2h 
(Fig. 4b) or 3h (Extended Data Fig. 8f). 
Cell extracts and chromatin solubilisation. For detergent/salt soluble extracts, 
HeLa $3 and U-2-OS cells were washed with cold PBS, scraped and incubated for 
15min on ice in HS buffer supplemented with trichostatin A (TSA) and protease 
and phosphatase inhibitors (5 mM sodium fluoride, 10 mM {-glycerolphosphate, 
0.2mM sodium vanadate, 10j1gml~! leupeptin, 10,.g ml“! pepstatin, 0.1mM 


PMSF, Sigma). After centrifugation at 16,000g for 15 min at 4°C, the superna- 
tant was collected. To analyse chromatin-bound complexes, cells were washed 
twice in cold PBS, scraped and centrifuged at 1,500g for 10 min at 4°C. The pellet 
was incubated on ice for 10 min in CSK buffer (10 mM PIPES pH 7, 100mM 
NaCl, 300 mM sucrose, 3mM MgCl,)/0.5% Triton X-100, supplemented with 
TSA and protease and phosphatase inhibitors (5mM sodium fluoride, 10 mM 
8-glycerolphosphate, 0.2mM sodium vanadate, 10j1g ml“! leupeptin, 10j1g ml! 
pepstatin, 0.1 mM PMSE, Sigma) and subsequently centrifuged at 1,500g for 10 min 
to collect soluble proteins. For DNasel or benzonase release of chromatin material, 
the remaining pellet was resuspended in CSK/0.1% Triton X-100 containing DNase I 
(1,000 U ml“!, Roche), or benzonase (2,500 U m1}, Millipore), and incubated at 
30°C for 30 min. Solubilized chromatin was then collected by centrifugation at 
16,000g for 10 min. 

Immunoprecipitation from cell extracts. Immunoprecipitation was performed 
with agarose magnetic GFP-Trap beads (Chromotek), anti-Flag magnetic beads 
(Sigma) and anti-HA magnetic beads (Life Technologies). Cell extracts were incu- 
bated with beads for 2h at 4°C rotating. The beads were subsequently washed five 
times with HS buffer and resuspended in 1 x LSB before boiling and SDS-PAGE 
separation of proteins on a NuPAGE 4-12% gel. 

Western blotting and antibodies. The following antibodies were used: TONSL 
(Abcam ab101898), TONSL (Sigma, HPA024679; validated in Extended Data 
Fig. 6d), MMS22L!, H3 (Abcam ab1791, Abcam ab10799), GFP (Santa Cruz 
sc-8334, Abcam ab290), biotin (Abcam ab53494), MCM2 (BD Biosciences 
610701), H2B (Abcam ab1790), H4K16ac (Millipore 07-329), H4K20mel (Abcam 
ab9051), H4K20mez2 (Cell Signaling 9759), 53BP1 (Santa Cruz sc-22760; Novus 
Biologicals NB100-904), \-H2AX (Millipore 05-636), Cyclin B (BD Biosciences 
610220), RPA70 (Abcam ab79398), SETS (Millipore, 06-1304), MCM3 (Abcam ab 
4460). Secondary antibodies conjugated with horseradish peroxidase (HRP) were 
from Jackson ImmunoResearch Labs. Signals were revealed by chemiluminescence 
substrate from Pierce (SuperSignal West Pico or SuperSignal West Femto). 

FACS and analysis. For analysis of cell cycle progression, cells were fixed in 70% 
ethanol and stained with propidum iodide/RNase for 30 min in the dark, before 
analysis on a FACS Calibur machine. FACS profiles were analysed by FlowJo 10.0.8 
software. 

Mass spectrometry. Histones from TIG3 fibroblasts were extracted from chro- 
matin as previously described'*. Protein was resuspended in 50 il of 100 mM 
triethylammonium bicarbonate (TEAB; Sigma), pH adjusted with 211 of 1.5M 
Tris pH 8 and digested for 16h at 37°C with 3411 of 20ngyl~' Asp-N (Wako) in 
100mM TEAB. After 15 min centrifugation at 10,000g at 25°C, the supernatant 
was placed in a new tube and digestion was repeated for the pellet during 4h 
under the conditions described earlier. The digested peptides of both digestions 
were merged, acidified with 101] of 1% TFA and purified using sequential Stagetip 
C18 and Carbon Toptip (Glygen). Purified peptides were evaporated, resuspended 
in 1511 of 0.1% TFA. Injected material was normalized to analyse by liquid chro- 
matography mass spectrometry (LC-MS) the histones corresponding to 9.0 x 10° 
cells. The LC method was used as described elsewhere!’ The MS was performed 
in an Orbitrap Classic with similar settings as described previously'® but with 
survey scan range at 550-690 m/z and MS2 set in scheduled and targeted data 
independent mode for the four-time charged ions of the four different methylation 
states (unmodified, mono-, di- and trimethylated H4K20). Peptides were quan- 
tified using the peak area from the corresponding extracted ion chromatograms 
(+£10p.p.m.). 

Immunofluorescence, microscopy and laser microirradiation. U-2-OS cells 
conditional for GFP-TONSL were grown on glass coverslips or 96-well plates and 
either directly fixed in 4% paraformaldehyde (PFA) for 10 min or washed in CSK, 
pre-extracted 5 min with cold CSK/0.5% Triton X-100 and rinsed with CSK and 
PBS before fixation in 4% PFA for 10 min. Coverslips were mounted on glass slides 
with Mowiol mounting medium (Sigma-Aldrich) containing DAPI. Fluorescence 
images were collected on a DeltaVision system with a x40 or x60 oil immersion 
objective. For colocalization analysis by deconvolution microscopy, z-stacks were 
acquired (step of 0.2 1m), deconvolved and analysed by SoftWoRX 5.0.0. Pearson 
coefficient correlation analysis was performed on single cells using SoftWoRX 
5.0.0. Brightness and contrast were adjusted using Adobe Photoshop CS6. For 
high-content quantitative analysis, fluorescence images were acquired using an 
Olympus ScanR high-content microscope and processed on the ScanR analysis 
software. More than 5,000 cells per sample were analysed. Cell cycle phases were 
gated on DAPI and EdU intensity. Graphs were generated with TIBCO Spotfire 
software. For microirradiation experiments, cells grown on glass coverslips were 
fixed in 4% formaldehyde for 15 min, permeabilized with PBS containing 0.2% 
Triton X-100 for 5 min and incubated with primary antibodies diluted in DMEM 
for 1h at room temperature. After staining with secondary antibodies (Alexa 
Fluor 488, 568 and 647; Life Technologies) for 30 min, coverslips were mounted 
on glass slides in Vectashield mounting medium (Vector Laboratories) containing 
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the nuclear stain DAPI. For detection of nucleotide incorporation during DNA 
replication, an EdU-Plus labelling kit (Life Technologies) was used according to 
the manufacturer's instructions. Confocal images were acquired on an LSM-780 
(Carl Zeiss) mounted on a Zeiss-AxioObserver Z1 equipped with a Plan-Neofluar 
x 40/1.3 oil immersion objective. Image acquisition and analysis was carried out 
with LSM-ZEN software. Laser microirradiation of cells was performed essentially 
as described**. 

ChIP. GFP-TONSL wild type and N571A U-2-OS harbouring the inducible 
ER-HA-AsiSI endonuclease”? were treated with 4-OHT and 10\1M DNA-PK 
inhibitor NU7026 (Millipore) for 4h to increase homologous recombination’. 
Cells were cross-linked for 10 min in 1% formaldehyde and chromatin was 
fragmented by sonication using Bioruptor Sonicator (Diagenode). ChIP was 
performed as previously described“ with the following modifications: 30 1g 
of chromatin was immunoprecipitated with 5 1g of anti-GFP (Abcam ab290) 
and rabbit-IgG. Immunoprecipitated DNA was analysed in duplicate by 
RT-qPCR. In all cases, ~H2A.X induction was verified by immunofluorescence 
and a sample without 4-OHT was included as a ‘no cut’ control. Primer pairs 
for the analysis of DSB-3, DSB-I and DSB-II are described*!. Primer sequences 
used for the amplification of a genomic region devoid of DSBs were as follows: 
noDSB-for: 5/-TGACAAGGACAGGGTCTTCC; noDSB-rev: 5’-CACCGTCCG 
TTGTATGTCTG. ChIP efficiency was calculated as percentage of input DNA 
immunoprecipitated. 

NCC. The NCC protocol!” was adjusted for adherent U-2-OS cells. CPT (11M) 
was added 5 min before b-dUTP labelling and was included in all steps until 
fixation. Cells were incubated for 5 min in a hypotonic buffer (50 mM KCl, 10mM 
HEPES) containing b-dUTP and resuspended into fresh cell culture medium 
for an additional 15 min. Cells were fixed 15 min in 1% formaldehyde, rinsed 
twice in PBS and collected by scraping in cold room. Nuclei were mechanically 
isolated in sucrose buffer (0.3 M sucrose, 10 mM HEPES-NaOH at pH 7.9, 1% 
Triton X-100 and 2mM MgOAc). Chromatin was solubilized by 28 cycles 30s on, 
90s off in sonication buffer (10 mM HEPES-NaOH at pH 7.9, 100 mM NaCl, 2mM 
EDTA at pH 8, 1mM EGTA at pH 8, 0.2% SDS, 0.1% sodium sarkosyl and 1 mM 
phenylmethylsulfonylfluoride) using a Bioruptor at 4°C. Solubilized chromatin 
was pre-cleared using streptavidin-coated magnetic beads (MyC]1 Streptavidin 
beads) pre-incubated with biotin. b-dUTP labelled chromatin was next purified 
over night at 4°C using streptavidin-coated magnetic beads. Beads were washed 
five times for 2 min in wash buffer (10 mM HEPES-NaOH pH 7.9; 200mM NaCl; 
2mM EDTA pH 8; 1mM EGTA pH 8; 0.1% SDS; 1mM PMSF). Total chromatin 
(input) and isolated nascent chromatin were boiled for 40 min on beads in LSB 1 x 
(50 mM Tris-HCl pH 6.8, 100 mM DTT, 2% SDS, 8% glycerol, bromophenol blue) 
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and separated by SDS-PAGE for western blotting. Pulse-SILAC-NCC (Fig. 3a) 
was performed as described'®. 

Clonogenic assay. U-2-OS inducible for GFP-TONSL ARD wild type and 
mutant were transfected with siRNA, trypsinized 24h later and seeded in tech- 
nical triplicates of 1,000 or 3,000 cells in the presence or absence of tetracycline. 
After 24h, cells were washed to remove tetracycline and CPT was added for 24h 
as indicated. Cells were then cultured in fresh medium for 12-15 days before 
fixation and staining with MeOH/Crystal Violet. Colony formation efficiency 
was determined by manual colony counting or quantification of Crystal Violet 
staining by Image] software and normalized to non-induced control. Each data 
point represents a technical triplicate of 1,000 or 3,000 seeded cells within each 
biological replicate. 
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Extended Data Figure 1 | TONSL binding to histones in vivo and 

in vitro. a, Histones bridge the interaction between TONSL-MMS22L and 
MCM 2 in cell extracts as shown by co-immunoprecipitation of Flag-HA- 
MCM? wild type or histone-binding mutant (Y81A, Y90A)’. U-2-OS cell 
inducible for Flag-HA-MCM? wild type or Y81A, Y90A’ were induced for 
24h before immunoprecipitation with Flag antibodies (one representative 
experiment out of two is shown). b, Immunoprecipitation of GFP-TONSL 
from solubilized chromatin of HeLa cells transiently transfected with 
GFP-TONSL plasmid, showing that TONSL associates with nucleosomal 
histones H3 and H2B (one representative experiment out of two is shown). 


c, Domain structure of TONSL!“. LRR, leucine-rich repeats; TPR, 
tetratricopeptide repeats; UBL, ubiquitin-like domain. d, Pull-down of 
GST-ARD with recombinant histones H3-H4 tetramers. e, f, Pull-down 
of in vitro-translated full-length TONSL with recombinant histones 
H3-H4 tetramers (e) or H2A-H2B dimers (f) coupled to NHS-activated 
sepharose beads (one representative experiment out of three (e) and two 
(f) is shown). ASF 1a wild type and histone-binding mutant (V94R) were 
included as controls. g, TONSL ARD consists of four ankyrin repeats and 
uses its elongated concave surface to target the H4 tail spanning residues 
12 to 23. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Models and sequence alignment of TONSL 
ARD. a, Pull-down assay of recombinant ARD with GST-H3 tail (amino 
acids 1-59) and GST-H4 tail (amino acids 1-31). b, Modelling of 
TONSL ARD on the co-chaperone structure of MCM2 HBD and ASF1 
in complex with an H3-H4 dimer. When comparing the structure of 

the TONSL-ARD-MCM2-HBD-H3-H4 tetramer complex with our 
previous structure of the MCM2-HBD-H3-H4-dimer-ASF1 complex’ 
(Protein Data Bank accession 5BNX), the common parts of both structures 
superimposed well with a small root mean squared deviation (r.m.s.d.) 
of 0.44 A. A model of the quinary complex composed of one molecule of 
each protein, TONSL ARD, MCM2 HBD, ASF1, H3 and H4, was made 
after superposition. This model shows that TONSL ARD, MCM2 HBD 
and ASF1 could simultaneously bind an H3-H4 dimer without steric 


clash. c, Model of TONSL ARD on the structure of the nucleosome. The 
model was generated by a direct superposition of the H3-H4 tetramer in 
the structure of the TONSL ARD-MCM2 HBD-H3-H4 tetramer complex 
onto the H3-H4 tetramer in the nucleosome structure (Protein Data Bank 
accession 3AV2). There was no adjustment in the conformation of the 
model and no steric clash in the model. The MCM2 HBD molecules were 
omitted from the model for clarity. d, Alignment of TONSL ARD (512- 
692) sequences from Homo sapiens, Mus musculus, Xenopus laevis and 
Danio rerio. The secondary structures of human TONSL ARD are showed 
on top of the sequence alignment. Asterisks indicate the highly conserved 
residues that constitute the H4 tail-binding surface of TONSL ARD and 
the three strictly conserved acidic residues forming hydrogen bonds with 
the key residue H4 Lys20 are highlighted with red asterisks. 
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Extended Data Figure 3 | Interaction details of TONSL ARD and 

GST pull-downs. a, b, Molecular details of the interactions of TONSL 
ARD with H4 tail region residues 12-15 (a) and residues 21-23 (b). 

The Lys12-Gly13-Gly14-Ala15 segment of H4 is positioned within a 
narrow surface channel of the TONSL ARD scaffold. The intermolecular 
contacts spanning the Lys12-Gly13-Gly14-Ala15 segment of H4 include 
hydrophobic interactions between residues Gly13, Gly14 and Alal15 of H4 
and residues Asn507, Cys508, Trp641, Tyr645 and Leu649 of ARD, as well 
as hydrogen bonds between the main-chain O of H4 Gly14 and Nel of 
ARD Trp641, and between the main-chain N of H4 Ala15 and On of ARD 
Tyr645 (a; Fig. 1c). The main-chain O of H4 Lys16 hydrogen bonds with 
the Né2 of ARD Asn571, while the side chain of H4 Lys16 forms contacts 
with ARD Asn607 and electrostatic interactions with the side chain of 
ARD Glu597 (Fig. 1c). The side chain of H4 Arg17 stacks over the side 
chains of ARD Tyr572 and Cys608, while its Ny1 atom forms 

two hydrogen bonds with main-chain O and O81 of ARD Asn571 

(Fig. 1c, e). The side chain of H4 H18 penetrates into a pocket lined by 
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four strictly conserved residues (Trp563, Glu568, Asn571 and Asp604) and 
is positioned over His567 of ARD (Fig. Ic, f). The side chain of H4 His18 is 
stacked between Trp563 and Asn571 and forms hydrogen bonds to Glu568 
and Asp604 of ARD (Fig. 1f). The main-chain O of H4 Arg19 forms a 
hydrogen bond with Nel of Trp563 and its side chain forms contacts with 
Cys561 and Gly595 of ARD (Fig. 1c). Interactions with the key residue 
H4 Lys20 are described in the text (Fig. 1g). The intermolecular contacts 
spanning the Val21-Leu22-Arg23 segment of H4 include contacts between 
side chains of H4 Val21 with Tyr560 and Cys561 of ARD (b), while H4 
Leu22 interacts with Asp527 and Met528 of ARD. The main-chain N of H4 
Arg23 forms a hydrogen bond with the main-chain O of Asp527 of ARD, 
while the side chain packs against the side chain of Tyr560 of ARD (b). 
c, Pull-down of recombinant histones H3—H4 with GST-TONSL ARD 
wild type or indicated mutants. d, Pull-down of pre-purified MCM2 
HBD-H3-H4 tetramer complex with GST-TONSL ARD wild type or 
indicated mutants. e, Circular dichroism analysis of TONSL ARD wild 
type and the indicated ARD mutants. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


TONSL ARD 


H4 tail 


c Time (min) 
0.05 0 10 20 30 40 50 


ucal/sec 


kcal/mole of injectant 


@ H3K9mel (1-21) 


0.0 0.5 1.0 1.5 2.0 
Molar Ratio 

Extended Data Figure 4 | Structural comparison of the ARDs of TONSL 
and GLP. a, b, Representative view of the TONSL ARD with histone H4 
tail (a; this work), and crystal structure of the GLP ARD in complex with 
histone H3 tail dimethylated at Lys9 (b; ref. 10). Both TONSL ARD and 
GLP ARD use the concave surface to bind their cognate target H4 tail 
and H3 tail, respectively. TONSL ARD recognizes H4K20me0 mainly 
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through three strong hydrogen bonds with acidic residues Glu530, Asp559 
and Glu568, while GLP ARD recognizes H3K9me2 mainly through an 
aromatic cage forming by residues Trp839, Trp844, Glu847 and Trp877. 

c, ITC analysis of TONSL ARD binding to H3K9mel peptide. d, ITC 
analysis of TONSL acidic stretch and ARD (amino acids 450-692) with 
H3K9mel (amino acids 1-21) and H4 (amino acids 9-25) peptides. 
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Extended Data Figure 5 | Effect of SET8 and MOF depletion on TONSL 
chromatin binding. a, Immunoprecipitation of GFP-TONSL from 
solubilized chromatin of GFP-TONSL U-2-OS cells (one representative 
experiment out of two is shown). Same exposures are shown for input and 
immunoprecipitation western blots of H3 and H4K16ac. b, TONSL ARD 
preference for H4K16ac could be mediated by 1599 through hydrophobic 
association with the K16 acetyl group as I599E ARD mutation 
preferentially reduces binding to H4K16ac peptides as compared to the 
unmodified H4 tail. Left, pull-down of GFP-TONSL from cell extracts 
with biotinylated H4 tail peptides. Right, quantification of the western 
blot, GFP-TONSL binding to the H4K 16ac peptide is shown relative to the 
unmodified peptide. Means with individual data points are shown (n = 2). 
c, High-content quantitative imaging of TONSL in pre-extracted U-2-OS 
cells. Plots show total chromatin-bound TONSL and DAPI intensities in 
cells treated with control or TONSL siRNA, confirming the specificity of 


bd MemCode 


TONSL antibody staining. Each dot represents one nucleus. d-f, Analysis 
of TONSL chromatin-binding in MOF-depleted (d), SET8-depleted (e) 
and ionizing radiation (IR)-treated cells (f). Chromatin-bound TONSL 
was quantified by high content imaging of pre-extracted U-2-OS cells 
stained for endogenous TONSL. Mean TONSL intensity is shown. AU, 
arbitrary units. d, e, Knockdown efficiency and expected effect on 
histone modification were confirmed by western blotting (representative 
of two experiments). e, f, G1 cells were defined by gating on DAPI and 
EdU intensity. f, TONSL is not recruited to DNA damage in G1 cells, 
supporting that TONSL accumulation in SET8-depleted cells is due to lack 
of H4K20mel and not DNA damage. Cells were irradiated (1.5 Gy) and 
analysed 1.5 h later (representative of two experiments). d, f, Error bars 
indicate s.d.; d, from left, n = 4,920, 2,341, 3,608, 2,917; f, n= 382 (—IR), 
523(+IR). 
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Extended Data Figure 6 | TONSL binding to chromatin during the cell 
cycle. a, b, H4K20 methylation levels on new and old histones analysed 
by NCC-pulse-SILAC (data are extracted from ref. 18). Cells grown in 
light SILAC medium were released into S phase in heavy medium and 
pulsed with b-dUTP. Chromatin was fixed, sonicated and b-dUTP-labelled 
fragments isolated on streptavidine beads by NCC. Histones were isolated 
and analysed by mass spectrometry for modifications on new (heavy) and 
old (light) histones. For clarity a 24h (G1/S) chase time point is included. 
Error bars indicate s.d.; n = 9 (S), 3 (S/G2, M), 5 (G1), 3 (G1/S). Data for 
M (old histones) is shown as the mean of n= 2, as light peptides were not 
detected in one of the three biological replicates. c, H4K20 methylation 
levels measured by mass spectrometry in synchronized TIG3 fibroblasts. 


d, Plot of mean EdU and total DAPI intensities from TIG3 fibroblasts 

as in Fig. 3b, with the intensity of chromatin-bound TONSL shown in 

the third dimension as a colour gradient. AU, arbitrary units. Each dot 
represents one nucleus. Note that a population of G2 cells (EdU negative) 
retain TONSL on chromatin. e, High-content quantitative imaging of pre- 
extracted U-2-OS cells stained for EdU and TONSL analysed as in Fig. 3b. 
f, Analysis of TONSL chromatin binding by cellular fractionation. U-2-OS 
cells released from a nocodazole block were followed by fluorescence- 
activated cell sorting (FACS) analysis of DNA content and analysed by 
western blotting of soluble (CSK-Triton extracted) and chromatin (pellet) 
fractions (representative of two experiments). 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Analysis of GFP-TONSL localization. 

a, Colocalization analysis of chromatin-bound GFP-TONSL with 
MCM2 analysed by deconvolution microscopy and measurement 

of Pearson coefficient in single cells. Error bars indicate s.d., n = 13 
from two independent experiments. Representative image, Fig. 3c. 

b, c, Representative images for the analysis shown in Fig. 3d. Cells were 
either pulsed with EdU (401M) for 15 min (b) or synchronized in G1/S 
and released into S phase in the continuous presence of EdU (541M) (c). 
Images are representative of b: n= 9 (very early), 16 (early/mid), 10 (mid/ 
late); c: 9 (very early), 27 (early/mid), 36 (mid/late). Scale bar, 5 um. 

b, EdU and MCM2 staining was used to determine the cell cycle state in 
asynchronous (asyn) cell populations. c, Progression through S phase 


was followed by FACS analysis of DNA content. d, Chromatin-binding 

of GFP-TONSL analysed by cellular fractionation in inducible U-2-OS 
cells as quantified in Fig. 3e. C, chromatin; S, soluble. e, f, Chromatin- 
binding analysis as in Fig. 3f. U-2-OS cells conditional for GFP-TONSL 
ARD wild type (WT) and mutant were directly fixed or pre-extracted to 
remove soluble proteins. Data are representative of three (e) and two (f) 
experiments, fields of cells in e are representative of (from left) n= 16, 

18, 17 and 17 images. Scale bar, 20 1m. g, Asynchronous U-2-OS cells 
conditional for GFP-TONSL were pulsed with 401M EdU for 15 min and 
soluble proteins were extracted. Representative images of EdU-positive 
cells are shown (n = 30 for wild type and N571A), for the specific patterns 
of TONSL wild type see Fig. 3d. Scale bar, 51m. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | TONSL-MMS22L recruitment to damaged 
DNA. a, Left, ChIP-qPCR analysis of GFP-TONSL recruitment to site- 
specific DSBs induced by AsiSI, as shown in Fig. 4d but with additional 
controls. Note that the colours have been changed for clarity. Mean of 
technical duplicates is shown. Right, dot plot illustrating the relative 
enrichment of GFP-TONSL wild type (WT) and N571A obtained in 

four independent ChIPs performed on two biologically independent 
chromatin preparations. Each experiment was normalized to GFP- 
TONSL wild-type enrichment at DSB-I_80bp. Mean is shown with 
two-sided Mann-Whitney test; ***P < 0.001; not significant, P > 0.05; 
n= 24, Two-sided Mann-Whitney analysis of individual experiments 
gave similar results. b, U-2-OS cells conditional for GFP-TONSL were 
laser microirradiated. 53BP1 and cyclin B staining was used as markers 
of DNA damage and cells in S/G2 phase, respectively. Representative of 
three experiments as quantified Fig. 4e. Filled arrowheads indicate GFP- 
TONSL recruitment; open arrowheads indicate no recruitment. Scale bars, 
101m c, U-2-OS cells transiently transfected with GFP-TONSL wild type 
or the indicated mutants were laser microirradiated and processed for 
-yH2A.X immunofluorescence. Representative cells are shown (n = 200 
cells per condition from two independent experiments). d, U-2-OS cells 
conditional for GFP-TONSL were laser microirradiated. YH2A.X and 
RPA staining was used as markers of DNA damage and cells undergoing 
resection in S/G2 phase, respectively. The percentage of GFP-TONSL cells 


with recruitment to RPA-positive (+) and RPA-negative (—) laser tracks 
is indicated. Data are representative of two independent experiments, 

a total of 118 cells were counted. e, Top, U-2-OS cells conditional for 
GFP-TONSL wild type and N571A were laser microirradiated. YH2A.X 
and EdU staining was used as markers of DNA damage and S phase cells, 
respectively. Bottom, quantification of GFP-TONSL cells with recruitment 
to laser tracks. Mean with individual data points are shown (n =2, a 

total of 138 (wild type) and 174 (N571A) cells were counted). f, H4K20 
methylation levels measured by mass spectrometry in synchronized TIG3 
cells as in Extended Data Fig. 6c. Cell were released into S phase for 3h 
and treated with hydroxyurea (HU; 3 mM) or CPT (11M) for 3h or left 
untreated (6h). Mean with individual data points are shown (n= 2). 

g, Colony formation in cells treated with control or TONSL siRNA and 
induced to express GFP-TONSL. As shown in Fig. 4f, but including 
additional mutants. Two cell concentrations in technical triplicate from 
two (E568A, D559A) or four (wild type, N571A) biological replicates are 
shown. h, Representation of the complementation analysis from Fig. 4f 
in a single panel including both CPT-treated and untreated cells. This 
illustrates that the toxicity of the TONSL ARD mutant is comparable 

to CPT treatment of cells expressing wild-type TONSL. i, Analysis of 
GFP-TONSL and MMS22L by cellular fractionation in cells inducible for 
GFP-TONSL ARD wild type and mutant. Representative experiment of 
the quantification shown in Fig. 4i. 
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Extended Data Figure 9 | Similarity of the ARDs in TONSL and 
BARD1, and protein inputs. a, Superposition of the structures of TONSL 
ARD and BARD1 ARD (Protein Data Bank accession 3C5R)*®. The 

main residues involved in TONSL ARD interactions with the H4 tail are 
compared to the corresponding residues of BARD1 ARD. The two ARDs 
show highly similar topology and conservation of the histone-binding 
surface. b, Input material of the experiment in Fig. 1h. c, Input material of 
the experiment in Fig. 1j. d, Spot assay with biotinylated H4 tail (amino 
acids 14-33) peptides confirming equal input into pull-down reactions. 


e, Input material of the experiment in Fig. 4b. f, Input material of the 

NCC experiment in Fig. 4c. Note that because ARD mutation disrupts 
chromatin binding in the presence and absence of CPT (Figs 3e, f and 4a), 
GFP-TONSL N517 levels are low in the input chromatin. The NCC 
experiment in Fig. 4c supports our microscopy-based data (Fig. 4a) and 
further shows that there is no local accumulation of the GFP-TONSL ARD 
mutant at damaged forks that could have been missed in our microscopy- 
based quantification of total TONSL on chromatin. 
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Extended Data Table 1 | Data collection and refinement statistics 


TONSL ARD - MCM2 HBD - H3/H4 Tetramer 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, By (°) 
Resolution (A) 
Roim (%) 

ol 
Completeness (%) 


Redundancy 


Refinement 
No. reflections (total/unique) 
Rvork/ Riree (%) 
No. atoms 
Protein 
MES 
GOL 
Water 
B-factors 
Protein 
MES 
GOL 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


Complex 


P321 


139.5, 13 


9.5, 72.9 


90, 90, 120 
50-2.43 (2.52-2.43)° 
3.8 (46.8) 

23.1 (1.8) 

99.8 (99.7) 


5.5 (5.5) 


171,308/31,146 


20.1/24.6 


2,908 
12 
12 
87 


81.8 
108.5 
92.6 
59.8 


0.009 
1.316 


*Highest-resolution shell is shown in parenthesis. One crystal was used for the data. 
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Translation readthrough mitigation 


Joshua A. Arribere!, Elif S. Cenik!, Nimit Jain2, Gaelen T. Hess*, Cameron H. Lee*, Michael C. Bassik? & Andrew Z. Fire! 


A fraction of ribosomes engaged in translation will fail to 
terminate when reaching a stop codon, yielding nascent proteins 
inappropriately extended on their C termini. Although such 
extended proteins can interfere with normal cellular processes, 
known mechanisms of translational surveillance! are insufficient to 
protect cells from potential dominant consequences. Here, through 
a combination of transgenics and CRISPR-Cas9 gene editing in 
Caenorhabditis elegans, we demonstrate a consistent ability of cells 
to block accumulation of C-terminal-extended proteins that result 
from failure to terminate at stop codons. Sequences encoded by 
the 3’ untranslated region (UTR) were sufficient to lower protein 
levels. Measurements of mRNA levels and translation suggested a 
co- or post-translational mechanism of action for these sequences in 
C. elegans. Similar mechanisms evidently operate in human cells, 
in which we observed a comparable tendency for translated human 
3’ UTR sequences to reduce mature protein expression in tissue 
culture assays, including 3’ UTR sequences from the hypomorphic 
‘Constant Spring’ haemoglobin stop codon variant. We suggest that 
3’ UTRs may encode peptide sequences that destabilize the attached 
protein, providing mitigation of unwelcome and varied translation 
errors. 

Failure to terminate translation at a stop codon can lead to ribo- 
somes translating into a 3’ UTR. In some cases, translation may pro- 
ceed through the 3’ UTR and into the poly(A) tail, triggering a process 
termed ‘nonstop decay’ and destabilizing both the mRNA and nascent 
protein (reviewed in ref. 1). However, for the majority of 3’ UTRs, a 
stop codon is encountered before the poly(A) tail?*. Readthrough 
events that encounter a subsequent termination codon are outside the 
scope of known translational surveillance pathways including nonstop! 
Depending on the 3! UTR and the frame in which the ribosome enters, 
the late stop codon can be several, tens, or even hundreds of codons 
into a3’ UTR, producing variant proteins with potentially problematic 
C-terminal appendages. This issue is highlighted by several pathologies 
caused by late frameshifts or stop-codon mutations in which 3’-UTR- 
encoded C-terminal extensions effect protein mislocalization*’, 
aggregation®”, and instability* '7, with severe consequences for organ- 
isms. Depending on sequence, genetic background, conditions, and 
organism, estimates of readthrough efficiency vary from <1% to 10% 
or more, posing a potential problem of nontrivial magnitude’®”’, 

We investigated whether, and to what extent, 3’ UTR translation has 
an effect on gene expression using a fluorescent reporter system in 
C. elegans. Initially, we selected 3’ UTRs from three genes: unc-54 
(encoding a muscle myosin), tbb-2 (a beta tubulin), and rpl-14 (a ribo- 
somal protein). For each gene, fusion of the 3’ UTR to a green fluores- 
cent protein (GFP)-driven by the myo-3 promoter resulted in robust 
fluorescence in body-wall muscle (Fig. 1a). Next, by mutating stop 
codons, we created GFP reporters for each gene, which caused trans- 
lation to read past the normal termination point, terminating instead 
at a stop codon part-way through the 3’ UTR (Fig. 1b). In each case, the 
‘late stop reporter accumulated substantially less GFP, with differences 
in signal of at least tenfold. As a control, a co-injected mCherry marker 
was robustly expressed in the same cells. We conclude that translation 


into the 3’ UTR can confer substantial loss of protein expression for at 
least these three 3’ UTRs in C. elegans. 

To test whether translation into 3’ UTRs could confer a loss of 
protein expression more generally, a two-fluorescent-reporter system 
with each fluorophore transgene containing an identical 3’ UTR was 
used. Nine genes were chosen to reflect a variety of functions and 
expression levels: rps-17 (small ribosomal subunit component), r74.6 
(dom34/pelota release factor homologue), hlh-1 (muscle transcription 
factor), eef-1A.1 (also known as eft-3, translation elongation factor), 
myo-2 (a pharyngeal myosin), mut-16 (involved in gene/transposon 
silencing), bar-1 (a beta catenin), daf-6 (involved in amphid mor- 
phogenesis), and a/r-1 (neuronal transcription factor). A criterion in 
choosing these genes was the presence (common for C. elegans genes, 
Extended Data Fig. 1) ofan in-frame stop codon in the 3’ UTRat least 
30 bases beyond the normal stop but upstream of known poly(A) sites. 
We fused the 3’ UTRs of each gene separately to GFP and mCherry, 
removing the canonical termination codon in the GFP construct. For 
each of the nine genes tested, the observed GFP signals were extremely 
faint, with raw GFP:mCherry fluorescence ratios of less than 0.1 (Fig. 1c, 
Extended Data Fig. 2). As a control, versions of the GFP reporter with 
the normal termination codon intact provided robust GFP expression, 
at least tenfold higher than the corresponding readthrough constructs 
(GFP:mCherry fluorescence ratios in the range of 0.3 to 0.9). 

Several observations suggest how translation into 3’ UTRs might 
reduce protein levels. (1) Experiments with specific mutagenesis sup- 
porta role for the eventual protein sequence. Shortening readthrough 
peptides (tested for unc-54 and thb-2) increased GFP expression 
(Fig. 2a). Extending this analysis, an equal-length non-synonymous 
substitution in the unc-54 3’ UTR restored GFP expression, whereas 
synonymous substitution with multiple base differences did not. 
(2) Mutagenesis analysis of constructs using a constant 3’ UTR rein- 
forced the inference of peptide sequence as the primary determinant of 
GFP loss. We found that the nucleotide sequence between the normal 
termination codon and the first in-frame termination codon was suffi- 
cient to confer GFP loss if inserted at the end of the GFP coding region 
for unc-54, tbb-2, hlh-1, daf-6, rps-20, or rps-30 (Fig. 2b). The rps-30 
readthrough region had the weakest effect on GFP, and was the shortest 
(nine amino acids). We performed further mechanistic dissection by 
synonymous variation of readthrough regions from unc-54, tbb-2, 
and rps-20, with GC contents from 35-60%, in some cases mutating 
>50% of bases. Each synonymously substituted variant conferred robust 
loss of GFP. (3) Decreased expression following translation into the 
3’ UTR required peptide linkage between the upstream protein and the 
3’-UTR-encoded segment. To assess the relationship between covalent 
linkage with the translated C-terminal peptide and the outcome for the 
larger protein, we took advantage of a picornavirus-derived oligopep- 
tide sequence that causes cleavage and release of the nascent chain, after 
which ribosomes continue translation of the downstream sequence!*!9, 
Insertion of the T2A peptide (EGRGSLLTCGDVEENPGP) between 
GFP and the unc-54 3'-UTR-encoded sequence rescued GFP expres- 
sion, whereas an uncleavable T2A* point mutant did not (Fig. 2b). 
Restoration of GFP levels by T2A to the level of no-insert controls 


1Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA. @Department of Bioengineering, Stanford University, Stanford, California 94305, USA. 


3Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA. 
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Figure 1 | Translation into 3’ UTRs results in substantial loss of 
protein expression. a, Dual fluorescence reporter assay to test expression 
with different 3’ UTRs. Transgenic arrays of each GFP construct were 
created using pha-1 selection and mCherry (pCFJ104) as a coinjection 
marker. Broad filter detects GFP and mCherry signals simultaneously; 
deviation from yellow towards red or green shows more mCherry or 

GFP fluorescence, respectively. Three independent transgenic lines were 
made for each (two for thb-2(TerByP)); transgenic lines with similar 
mCherry expression are shown. 200 ms exposure, 10 x objective. b, Dual 
fluorescence reporter assay to test expression of readthrough for different 
3’ UTRs. The stop codon of each 3’ UTR was mutated, allowing translation 
to proceed into the 3’ UTR. TerByP refers to “Termination ByPass, the 


also argues against mRNA destabilization as a substantial factor in the 
protein loss observed upon readthrough. 

The above results could be explained if GFP was generally incom- 
patible with C-terminal fusions in our system. To address this, 
we inserted a variety of sequences downstream of GFP: 3 x Flag, 
3 x haemagglutinin (HA), three random sequences created in silico, and 
six arbitrary fragments of in-frame coding sequence from C. elegans 
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Figure 2 | Identification of determinants for product loss upon 
translation into the 3’ UTR. a, Shortening or non-synonymous mutations 
of the readthrough region can restore GFP expression. Stop codons and/or 
mutations were inserted into each GFP::3’-UTR fusion as shown, with stop 
codons (red stop sign) and poly(A) site (blue arrowhead). tindicates the 
constructs shown in Fig. 1. mCherry (pCFJ104) was used as a coinjection 
marker. ‘+X AA indicates amino acids added relative to cognate control 
(‘+0 AA’) construct. Constructs and mutated regions drawn to scale, 

scale bar at top. Mean and s.d. of n (shown in parentheses) lines shown. 


720 | NATURE | VOL 534 | 30 JUNE 2016 


Ce 


GFP 


GFP 


c aa 


mCherry 


gene-X(TerByP) myo-3 gene-X(3’UTR) 


ie axe 
ll 


GFP gene-X(TerByP) 


GFP (10x) Broad filter 


Prnyo-3 


GFP:mCherry fluorescence 


<0.01 0.1 1 Avg. +s.d. (n) 
eef-1A.1(3’UTR) fo ET 0.39 + 0.15 (3) 
! | ! P=0.022 
eef-1A.1(TerByP) ba aa: 2i 4 0.02 + 0.01 (3) 
| i | 
rps-17(3'UTR) } ae 0.81 + 0.08 (3) 
anes ar { ° P=0.001 
rps-17(TerByP) + yy [4 0.08 + 0.02 (2) 
daf-6(3’UTR) H aa 0.46 + 0.1 (3) | aes 
daf-6(TerByP) |i ai i iq 0.02 + 0.01 (3) ; 
si /) LI i I + 
hh-1(3'UTR) 5 | | #a40.44 £0.08 (3) |, 9 goy 
hih-1(TerByP) |@ i | 0.01 + 0.01 (3) 
mut-16(TerByP) Hl a 0.03 + 0.01 (3) 
alr-1(TerByP) } ron (| 0.08 + 0.02 (3) 
174.6(TerByP) Lo: ai! | 0,04 + 0.01 (3) 
| i 
myo-2(TerByP) }4a & i (| 0.01 + 0.01 (3) 
bar-1(TerByP) ‘aia! 0.05 + 0.01 (3) 


region between the canonical termination codon and first in-frame 
termination codon in the 3’ UTR. Images were collected as in a. ‘GFP 
(10x)’ is a 2s exposure. The dim yellowish fluorescence in “GFP (10x)’ 
for unc-54(TerByP) and tbb-2(TerByP) is autofluorescence. c, For each 
gene, the 3’ UTR was fused to mCherry and GFP. GFP expression was 
tested with the stop codon mutated to a sense codon (TerByP). For each 
of eef-1A.1, rps-17, daf-6, and hlh-1, GFP expression was also tested 
with the normal stop codon in place (3’ UTR). The ratio of GFP to 
mCherry fluorescence under a broad fluorescence filter was used as a 
metric (Extended Data Fig. 2, Methods). Each triangle represents an 
independently generated transgenic line; mean and s.d. of n (shown in 
parentheses) lines shown. Student's t-test two-tailed P value. 


genes, approximately length-matched to 3/-UTR-encoded sequences 
(Fig. 2b). GFP expression varied between constructs but was generally 
higher than 3'-UTR-encoded sequences: 3 x HA, 3 x Flag, 2 out of 3 
random sequences, and 4 out of 6 coding-derived fragments exhib- 
ited GFP:mCherry fluorescence ratios of >0.13, higher than all nine 
tested 3'-UTR-derived C-terminal extensions and significant statis- 
tically (P= 0.004, Kolmogorov-Smirnov test). Thus the effects of 
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b, 3/-UTR-encoded peptides are sufficient to confer GFP loss. Sequences 
were inserted upstream of the Jet-858 3’ UTR. ‘syn, synonymously 
substituted variants. Shuffle1-3 contain shuffled codons of unc-54, tbb-2, 
and rpl-14(VLFL to RSCA) TerByP regions (Extended Data Fig. 4). T2A, 
‘self-cleaving’ peptide which releases the upstream nascent chain; T2A%*, 

a non-cleaving variant!*!°. Rand1-3(A, C, G, T) are random combinations 
of A, C, G, and T created in silico. CDS N-M is an arbitrary fragment of 
the respective coding DNA sequence of the gene (from amino acid N to M). 
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Figure 3 | Translation into the 3’ UTR at an endogenous locus decreases 
protein levels. a, Schematic of wild-type and readthrough alleles of unc-54, 
the latter made using CRISPR-Cas9 genome editing'”. See Extended 

Data Table 1 for additional loci and edits. b, Brightfield images of unc-54 
alleles. Arrowhead indicates a ‘bag of worms; the shell of an egg-laying- 
defective mother consumed by its retained progeny. c, RNA-seq from 
unc-54(TerByP/+) heterozygotes showed no differential effect on RNA 
levels. unc-54(TerByP/+) heterozygotes were chosen among progeny of 
unc-54(+) males crossed with unc-54(TerByP) homozygotes and 


3’-UTR-encoded sequences are not explained by a general intolerance 
of GFP to C-terminal extensions (see also Methods, Extended Data 
Figs 3, 4). 

It was conceivable that peculiarities of GFP and/or transgene 
expression systems might underlie the above observations. To estab- 
lish effects of 3’ UTR translation at endogenous genes, we sought 
loci where (1) a loss of protein would be detectable phenotypically, 
(2) C-terminal fusions are known to be functional, (3) the next in-frame 
stop codon of the endogenous locus is >10 amino acids past the 
annotated stop codon, yet upstream of annotated poly(A) sites’®, and 
(4) there is little or no autoregulation/feedback. unc-54 and unc-22 meet 
all of the criteria, and pha-4, unc-45, and tra-2 at least the first three 
points (Methods). For each locus, we mutated the stop codon to allow 
translation into the 3’ UTR’ (Fig. 3a). In parallel, we analysed small 
insertions/deletions generating late frameshifts for unc-22 and unc-54. 
Additional controls had length-matched sequences and/or GFP tags at 
the C terminus (Extended Data Table 1). For each of unc-22, unc-45, 
unc-54, and tra-2, translation into the 3’ UTR in at least one frame 
generated a strong hypomorphic (near null) phenotype specific to each 
locus. Other C-terminal tags for each gene did not cause loss of expres- 
sion, although one tra-2 C-terminal tag did produce a Tra phenotype. 
The ability to place alternative tags on the C terminus without obvious 
phenotypic consequences argues against a general sensitivity of the C 
terminus to tagging. For unc-22 and unc-54, that elongation into the 3’ 
UTR in only some frames elicited a hypomorphic phenotype suggests 
that ribosome elongation into the 3’ UTR is not detrimental per se. 

To determine the consequences on gene expression upon translation 
into 3’ UTRs, we analysed the effects of the unc-54(cc3389) TAA(stop) 
to AAT(Asn) mutation on RNA, translation, and protein output. 
We analysed mRNA expression in unc-54(cc3389/+) heterozygotes 
(phenotypically wild type to avoid complications from an Unc pheno- 
type). RNA-seq revealed that the unc-54(cc3389) and wild-type alleles 
were at approximately equal amounts in the mRNA pool, suggesting 
that 3’ UTR translation does not appreciably destabilize the unc-54 
mRNA (Fig. 3c). In parallel, we detected an ~20-fold reduction in 
UNC-54 protein in immunoblots in unc-54(cc3389) mutants (Fig. 3d). 
To look for possible alterations in translation for unc-54(cc3389), 
we examined the distribution of RNase-protected mRNA fragments 
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allele-specific reads identified. The framed inset shows individual allele- 
specific RNA-seq reads (bars) from unc-54(TerByP) (AAT, green) and 
unc-54(+) (TAA, blue). See also Extended Data Figs 5, 6. d, Quantification 
of UNC-54 protein levels. Immunoblotting was performed on 
homozygous populations of the indicated animals. unc-54(r293) encodes 

a nonsense-mediated decay allele of unc-54, producing <5% of normal 
UNC-54 protein. unc-54(r259) contains a >17 kb deletion spanning most 
of the unc-54 locus. For the lower blot, the number of animals loaded per 
lane is indicated. For gel source data, see Supplementary Fig. 1. 


with ribosome footprint profiling'®. We observed no significant dif- 
ference in the loading of ribosomes on unc-54 mRNA (Extended Data 
Fig. 5), nor on the number, distribution, frame, or fragment size of 
ribosomes in the extended region (Extended Data Fig. 6). 

A model that arises from these observations is that 3’- UTR-encoded 
peptides mark their resulting products for destruction, either co- or 
post-translationally. Conceivably this process might operate either in 
a specific cell/tissue type or in a broad spectrum of different contexts. 
A broadly-expressed reporter bearing a readthrough extension would be 
expected to highlight any tissue that failed to destabilize the C-terminal 
peptide. Using a broadly expressed promoter (unc-37) driving 
GFP with and without the unc-54 3‘-UTR-encoded peptide, we 
observed no cells where GFP was robustly retained (data not shown). 

We likewise considered the possibility that 3’- UTR-encoded peptides 
might act to limit protein levels in human cells, developing a specific 
assay using a lentiviral dual fluorescence reporter encoding puromy- 
cin N-acetyl-transferase tethered to mCherry-T2A, followed by eGFP 
anda multiple cloning site (Fig. 4a). The resulting reporter expresses 
both fluorophores from the same mRNA, yet as two disjoint polypep- 
tides, allowing consideration of the effect of a peptide tag on eGFP 
expression independent of effects on mCherry and mRNA expression. 
We validated the split dual fluorophore approach in K562 cells using 
tags known to be destabilizing (d41ODC, d40DC’”) or not (3 x Flag, 
3 x HA) (Fig. 4b). We selected 13 genes of varying expression and func- 
tion, and inserted the region between the annotated termination codon 
and first-in-frame termination codon downstream of eGFP. For 9 of 
13 genes, the readthrough region reduced the eGFP:mCherry fluo- 
rescence ratio between 3- and 30-fold, a stronger reduction than the 
degron d4ODC (Fig. 4c). Although not universal, the substantial loss 
of eGFP fluorescence for a majority of readthrough regions opens up 
the possibility that translation into 3’ UTRs may be generally inhibitory 
to expression across systems. 

We hypothesize that a function of 3’ UTRs is to minimize the accu- 
mulation of extended protein products that could be produced through 
translational readthrough. 

This feature may prove generally important in causes of genetic 
disease. For example, readthrough alleles (such as stop to Gln) of the 
HBA2 locus in humans produce a fraction (~1%°) of normal HBA2 
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Figure 4 | Translation into 3’ UTRs results in protein loss for several 
genes in humans. a, Lentiviral reporter schematic. A puroR-mCherry 
fusion was co-translationally cleaved from eGFP-insert by T2A. 
Constructs in b-d were integrated into K562 cells via lentiviral infection 
and puromycin selection. b, Validation of dual fluorescence reporter. 
Inserts downstream of eGFP were 3x Flag, 3x HA, and degrons d40DC 
(half-life, ~4h!°), dLODC (half-life, ~1h!’). c, The sequence between 
the annotated and first in-frame termination codon (TerByP) from each 
gene was inserted downstream of eGFP (solid line). For comparison, 
nucleotides of each TerByP region were randomized, producing a length- 
and nucleotide-frequency-matched construct (randTerByP, dashed line). 
Cells with eGFP lacking an insert and grown a week apart (top, green 
solid lines) and approximate fluorescence ratio of d44O0DC (orange line) 
are shown. d, The first 30 amino acids of the HBA2 3’ UTR were inserted 
downstream of eGFP (orange). Insertion of a self-cleaving T2A peptide 
restored expression (blue), whereas an uncleavable mutant (T2A*) did not 
(light blue). 


protein (a-globin), causing thalassemia. Translation into the HBA2 3’ 
UTR is known to destabilize the HBA2 mRNA”, but it is unclear what 
effect the appended C-terminal 31 amino acids have on HBA2 protein. 
We considered the possibility that the HBA2 3’-UTR-encoded peptide 
might prevent protein expression in humans, contributing to the loss of 
HBA2 protein. When appended to eGFP, the HBA2 3’-UTR-encoded 
peptide decreased the eGFP:mCherry fluorescence ratio in K562 cells 
(Fig. 4d). Furthermore, eGFP fluorescence was rescued by a self-cleaving 
(but not an uncleaveable mutant) T2A peptide. 

Several observations from the literature support the notion that 
3’-UTR-encoded peptides may be detrimental to expression for more 
genes and organisms than those assessed here. In Saccharomyces 
cerevisiae, translation past a point in the HIS3 3’ UTR confers a sub- 
stantial loss in protein expression, without detectable effects on mRNA 
levels’. Similarly, readthrough of the cyclic AMP phosphodiesterase 
PDE2 stop codon produces a destabilized protein variant, and this has 
been suggested to explain elevated cyclic AMP levels in PSI* yeast!°. 
Differential stability by polymorphisms in the readthrough peptide of 
SKY1 has been postulated to explain [PSI]-induced strain differences 
in diamide sensitivity”!. Particularly intriguing are recent findings that 
stop codon mutations at the c-FLIP, locus confer protein instability 
for this anti-apoptotic factor in mice, leading to embryonic lethality’”. 
The same study also noted several hereditary human disease alleles 
where 3'-UTR-encoded peptides are destabilizing, conferring marked 
decreases in protein activity and level’. 

Not every case of stop codon readthrough is destabilizing*’ (Fig. 4c), 
and some readthrough events are functional and regulated to defined 
levels*-*5, Understanding the mechanisms by which some readthrough 
events are detected and cleared (whereas others are not) may prove 
informative in biological contexts and pathological states where inap- 
propriate readthrough occurs. We do not yet know the determinants of a 
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translated 3’ UTR sequence that confer loss of protein, though the ability 
of numerous sequences (including shuffled and randomized 3’ UTR 
variants, Fig. 2b, 4c) suggests that a highly degenerate sequence is suffi- 
cient. Consistent with the idea that the effects of readthrough peptides 
may be mediated via their biophysical characteristics, we observed a 
significant negative relationship between hydrophobicity and expression 
of GFP (K562 cells, C. elegans) and endogenous loci (unc-22 and unc-54, 
C. elegans) (Extended Data Figs 7-9, Supplementary Information). 

Destabilization by 3’-UTR-encoded peptides could effectively 
mitigate at least three types of events in which a stop codon is inap- 
propriately bypassed: (1) stop codon misreading (for example, by 
suppressor tRNAs). Suppressor tRNAs permit readthrough of up to 
30% of ribosomes at a stop codon (UAA, UAG, or UGA)'*. Whereas 
some suppressor tRNAs can be toxic, other cells tolerate even high 
levels of readthrough!?°*®, Destabilization of readthrough products 
by C-terminal appendages may effectively buffer cells from suppres- 
sor tRNA-induced proteostatic chaos. (2) A ribosomal frameshift in 
a coding region which is late enough that no premature termination 
codon is encountered. In this case, ribosomes would enter the 3’ UTR 
out-of-frame with the coding region. In our manipulations, translation 
of 3’ UTRs in multiple frames was detrimental to expression (Extended 
Data Table 1, data not shown), and similar amino acid and hydropathy 
biases hold for all three 3’ UTR frames (Extended Data Figs 8, 9). 
(3) Errors in RNA processing or ribosome dysfunction could produce a 
variety of other improperly terminated peptides from which translation 
readthrough mitigation would provide valuable relief. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


C. elegans strain construction and husbandry. C. elegans were grown at 23°C 
on agar plates with nematode growth medium seeded with Escherichia coli strain 
OP50 as described”. Some strains were provided by the CGC, which is funded by 
NIH Office of Research Infrastructure Programs (P40 OD010440). A full list of 
strains used is available in Supplementary Table 1. 

Transgenic array-containing strains were generated as follows: PD5102 
(pha-1(e2123ts)I; rde-1(ne300)V) young adult hermaphrodites (grown at 16°C) 
were injected with a mix of 90ngul~! pC] (containing a rescuing fragment of 
pha-1), 5ngul~! of an mCherry-containing vector, and 5ngul~! of a GFP- 
containing vector. Unless otherwise indicated, GFP was driven by the myo-3 
promoter to drive expression in the body-wall muscle*’. Injectants were shifted 
to 23°C to select for F1 progeny animals bearing a transgenic array (selecting for 
pha-1(-++) expression*!), The rde-1 allele included in this strain avoided a modest 
degree of secondary siRNA-based silencing observed with many extrachromo- 
somal transgenes*. For transgenic lines generating low levels of GFP, we consid- 
ered the possibility that the GFP protein was toxic and selected against. Under 
this model, one might expect (1) a subset of sick and/or dead GFP-positive F1 ani- 
mals, (2) muscle defects due to muscle-specific expression of potentially-toxic GFP 
derivatives, (3) concomitant low levels of mCherry, and/or (4) a decrease in the 
efficiency with which transgenic lines were obtained**. None of these effects were 
observed, arguing against any contribution of negative selection to the observed 
low GFP expression. 

For a subset of strains, we deviated from the above protocol to generate pha-1 
arrays as follows: (1) whereas most transgenic lines were generated from inde- 
pendently injected parents, a handful of strains were possibly generated from 
siblings of an injected parent (PD6480, 6481, 6482, 6483, 6484, 6485, 6486, 6493, 
6494, 6495). In these cases, all injectants were pooled together on the same plate, 
and independent F1 were picked off to generate transgenic lines. Previous work 
has demonstrated independent F1 from the same injected parent carry distinct 
transgenic arrays****. (2) During the course of our analyses, we found some strains 
with an mCherry-negative subpopulation. The mCherry-positive subpopulation 
was isolated and propagated to generate the strains PD6401, 6450, 6452, 6456, 
6457, and 6464. 

CRISPR-Cas9 genome editing was performed in the VC2010 (PD1074) N2 
background as described!”. We selected pha-4 (refs 34, 35), unc-45 (refs 36, 37), 
tra-2 (refs 38-40), unc-22 (ref. 29), and unc-54 (refs 26, 38, 41, 42) based on the 
criteria in the text (citations indicated). The statement that unc-22 and unc-54 
exhibit little or no autoregulation/feedback is based on a number of genetic experi- 
ments (with heterozygous”, amber-suppressed”°, and/or smg-suppressed* alleles) 
which express either UNC-54 or UNC-22 at stable intermediate levels (between 
wild type and null). Alleles of unc-45 were initially generated in the VC2010 
background, though the embryonic lethality made unc-45(TerByP) difficult 
to maintain. We subsequently remade all alleles in a balanced heterozygote 
background (sC1(s2023)(dpy-1(s2170)) III/+-) and considered non-Dpy segregants 
for phenotypic analyses. 

Human cell line construction. K562 cells (obtained from ATCC) were grown 
at a density of ~0.5 to 1 x 10° cells ml~! in RPMI medium supplemented with 
penicillin/streptomycin, L-glutamine, and 10% FBS. All cell lines were maintained 
in a humidified incubator (37°C, 5% CO), and checked regularly for mycoplasma 
contamination. As a means of validating K562 cells, we performed RNA-seq on 
a subset of lines and observed good correlation with published data sets*? (data 
not shown). Viral particles were produced in HEK293T cells in 6-well dishes, and 
1 ml of viral supernatant was used to infect ~100,000 K562 cells by spin infec- 
tion, 10° relative centrifugual force for 2h. Polybrene was omitted to keep the 
infection rate low (<10%), ensuring a single incorporation event for most cells. 
After 3 days of recovery, cells were selected with puromycin at 0.7 1g ml"! for at 
least 3 days. Fluorescence was examined on a BD Accuri Cé flow cytometer, with 
appropriate gating for live cell events and investigators blinded to cell line identity. 
For each construct examined via puromycin selection in K562 cells, similar eGFP 
and mCherry fluorescence levels were also observed in transient transfection in 
HEK293T cells in the absence of puromycin, arguing against a puromycin-selected 
skew in mCherry fluorescence. 

Plasmids. Plasmids were constructed by restriction digest or Gibson cloning as 
detailed in Supplementary Table 2. pJA138/L3785 and pJA137/pCFJ104 were used 
as the basis of all C. elegans GFP- or mCherry-containing vectors, respectively. 
Portions of pMCB306 and pMCB309 were used to construct pJA291, the parental 
puro::mCherry::T2A::eGEP::MCS::wPRE vector for experiments in human cells. 
Plasmids were confirmed by both sequencing and restriction digest, and plasmid 
concentrations determined with the QuBit dsDNA Broad Range kit (Invitrogen). 
Plasmids that may be useful have been deposited with Addgene: pJA327 (C. elegans 
superfolder GFP in L3785), pJA291, pJA317 (pJA291 with dlODC insert) and 
pJA318 (pJA291 with d4ODC insert). 


GFP fusions were contructed with a GFP variant that corresponds to wild- 
type (Aequora) GFP with mutations at position 65 (Ser to Thr for human, Ser 
to Cys for C. elegans) known to improve folding and acquisition of fluorescence. 
Even with these mutations, GFP has a known propensity to misfold under some 
circumstances, and we therefore examined the effect of a subset of the 3’-UTR- 
encoded sequences (hlh-1, daf-6, and unc-54) downstream of a faster and more 
robust-folding GFP variant, superfolder GFP (ref. 44). The observed reduction in 
superfolder-GFP:mCherry ratios was quantitatively similar to that observed with 
normal GFP (Extended Data Fig. 3). 

Sequences of Flag*®, HA“, dlODC"’, and d40DC!? were obtained from the 

indicated publications. For exact sequences, see Supplementary Tables 1 and 2. T2A 
was previously shown to function in C. elegans'*. Translation elongation through a 
member of the 2A peptide family (consensus D(V/I)EXNPGP) causes ribosomal 
pause, then release of the N-terminal peptide up to and including the Gly’. 
Translation elongation resumes, with the C-terminal peptide being produced 
with an N-terminal Pro. 
Microscopy. Animals were immobilized by placing on a slide with a coverslip in 
5mM EDTA, 50mM NaCl, 1 mM levamisole and imaged on a Nikon Eclipse E6000 
microscope using a Nikon super high pressure mercury lamp power supply. Filter 
cubes for fluorescence images were GFP (96342, Nikon Corp), mCherry (96321, 
Nikon Corp), and broad (GFP and mCherry, 59022, Chroma Technology Corp). 
Images were collected with a 3CCD Digital Camera C7780 (Hamamatsu Corp) 
using HCImage (Version 1.0.2.060107, Hamamatsu Corp). Images of PD4251 and 
one of PD3363/3364 were taken for each imaging session and compared to ensure 
consistency between days. 

For quantification of GFP to mCherry relative fluorescence, animals were 
imaged using a 4 x objective with a broad filter and 200 ms exposure. Investigators 
were blinded during imaging. To avoid image over- or underexposure, a number 
of exceptionally bright or dim strains were taken with a decreased or increased 
(respectively) exposure time (PD1798, 40 ms; PD3294, 500 ms; PD3299, 50 ms; 
PD3395, 500 ms; PD6327, 50 ms; PD6375, 50 ms; PD1786, 50 ms; PD1789, 50 ms; 
PD1790, 50 ms; PD6460, 500 ms; PD6469, 50 ms; PD6471, 50 ms; PD6472, 50 ms; 
PD6473, 50 ms; PD6485, 100 ms; PD1787, 40 ms; PD6450, 100 ms; PD6455, 
40 ms; PD6477, 40 ms; PD6479, 50 ms; PD6498, 40 ms; PD6499, 40 ms; PD6500, 
40 ms; PD6501, 40 ms; PD6502, 40 ms; PD6503, 40 ms; PD6504, 40 ms). 

Raw pixel values for the red and green channels were obtained from image files 
using the tifffile package in python. Pixels below a threshold distance (200) from 
the median pixel intensity of the entire image were discarded as background. Pixels 
above a threshold intensity distance (4,000 of a possible 4,095) from the origin were 
discarded as saturated. The median pixel intensity for the entire image (essentially 
the black background, given the relatively low density of C. elegans tissue) was 
subtracted from the remaining pixels, and the slope of the linear regression line 
taken as the GFP:mCherry fluorescence ratio. This metric was robust to different 
exposure times and neutral density filters. 

Statistics. Statistical tests and P values are stated throughout the text and figures. 

To test statistical significance of C-terminal appendage effects on the 
GFP:mCherry fluorescence ratio (Fig. 2b), we divided the data into two groups: 
(1) 3/-UTR-derived (unc-54(TerByP), tbb-2(TerByP), daf-6(TerByP), hlh-1(TerByP), 
rps-20(TerByP), rps-30(TerByP), shuffle1-3); and (2) non-3’-UTR-derived 
(rand1-3(A,C,G,T), 3 x HA, 3 x Flag, eef-1A.1(CDS63-83), bar-1(CDS452-492), 
daf-6(CDS756-782), mut-16(CDS89-101), alr-1(CDS93-126), hlh-1(CDS290- 
320,syn1)). For each construct, we took the average GFP:mCherry fluorescence 
ratio of all available lines. We compared the distribution of 3’-UTR-derived and 
non-3’-UTR-derived GFP:mCherry fluorescence ratio values by Kolmogorov- 
Smirnov test (P= 0.004). 

No statistical methods were used to predetermine sample size. 

Ribosome footprint profiling. Ribo-seq was performed essentially as previously 
described'*“”, with a few modifications. Briefly, animals were grown to around L4 
stage, and collected by centrifugation and flash-freezing in liquid nitrogen. Animals 
were ground with a mortar and pestle in liquid nitrogen, after which the powder 
was thawed in excess volume ice-cold polysome lysis buffer (20 mM Tris (pH 8.0), 
140mM KCl, 1.5mM MgCh, 1% Triton) with cycloheximide (100 1g ml~!), RNase 
1 and sucrose gradient centrifugation was performed as previously described”. 
Around 21g of purified, RNase-1-digested monosomal RNA was run ona urea 
15% polyacrylamide gel, and the entire region from ~15-30 nt was excised for 
library preparation. At this point, the protocol continued with T4 polynucleotide 
kinase (PNK) (New England Biolabs) treatment as with the RNA-seq] protocol 
(next section). 

RNA sequencing. Two RNA sequencing (RNA-seq) procotols were used in this 
study. The first RNA-seq protocol (RNA-seq1) was performed on homozygote pop- 
ulations of animals (Extended Data Fig. 5). 5 1g of total RNA was treated with the 
RiboZero kit (Illumina). RNA was fragmented at 95°C for 30 min by addition of an 
equal volume of 100 mM sodium carbonate, 0.5 mM EDTA (pH 9.3) buffer. RNA 


© 2016 Macmillan Publishers Limited. All rights reserved 


fragments were gel-purified, then treated with T4 PNK. 3/-ligation with AF-JA-34.2 
adaptor (/5rApp/NNNNNNAGATCGGAAGAGCACACGTCT/3ddC/, Integrated 
DNA Technologies) and T4 RNA ligase 1 (New England Biolabs) was performed 
at room temperature for 4h with 20% PEG8000 in 3.3 mM DTT, 8.3 mM glycerol, 
50mM HEPES KOH (pH 8.3), 10mM MgCh, 10 pg ml! acetylated BSA. 
Unligated AF-JA-34.2 was removed by sequential treatment with 5’deadenylase 
(M0331S, New England Biolabs), then Rec]; (M0264S, NEB). Reverse transcrip- 
tion was carried out with AF-JA-126 (/5Phos/AGATCGGAAGAGCGTCGTGT/ 
iSp18/CACTCA/iSp18/GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, 
Integrated DNA Technologies) as a primer. Circular ligase treatment and PCR were 
as previously described”. 

A second RNA-seq protocol (RNA-seq2) was used to examine RNA levels with 
small numbers of heterozygote animals (Fig. 3C). Around sixty L2-L4 mixed gender 
animals were picked and flash-frozen in 50 mM NaCl, and RNA extracted with tri- 
zol. RNaseH and 94 oligonucleotides complementary to ribosomal RNA were used 
to deplete rRNA from the sample“®. Briefly, ~250 ng of a cocktail of DNA oligonu- 
cleotides complementary to rRNA (Supplementary Table 3, ordered from Integrated 
DNA Technologies) was mixed with ~100 ng total RNA in 125 mM Tris (pH 7.4), 
250mM NaCl in 811. The sample was heat-denatured at 95°C for 2 min, then cooled 
at —0.1°C per s to 45°C. Ll of digestion buffer was added (500 mM Tris (pH 7.4), 
1M NaCl, 200mM MgCl,) with 1 il (5 units) Thermostable RNase H (Epicentre), 
and the sample was incubated at 45°C for one hour. DNA oligonucleotides were 
removed by treatment with TURBO DNase (ThermoFisher) at 37°C, and RNA was 
extracted using an equal volume of phenol/chloroform. An RNA-seq library was 
prepared using the SMARTer Stranded RNA-Seq kit (Clontech Laboratories, Inc.). 
Sequencing. Libraries were sequenced on a MiSeq Genome Analyzer (Illumina, 
Inc.). Reads were mapped to the C. elegans genome (Ensemb170, WBcel215) using 
STAR (v2.3.1 ref. 49), with the mutated bases of unc-54(cc3389) and unc-54(e1301) 
masked. For Ribo-seq and RNA-seq1, reads bearing the same last 6 nucleotides 
(from NNNNNN, added with AF-JA-34.2) were assumed to be PCR duplicates and 
collapsed to a single read. For RNA-seq2, multiple reads containing the same start 
and stop mapping positions were collapsed to a single read count to reduce effects 
of PCR bias. The removal of PCR duplicates with either protocol only affected 
~5-10% of reads and did not adversely impact any of the analyses shown. RNA-seq1 
and Ribo-seq were performed once for each strain shown in Extended Data Figs 5, 6. 
Genomes and annotations. Although we sought to use the latest genome versions 
and annotations, we found it prudent to take advantage of the care and time with 
which other researchers annotated and analysed earlier versions of genomes. For 
whole genome alignment of nematode species, C. elegans UCSC genome ce10/ 
WS220 was used. To examine the length of predicted C-terminal extensions upon 
readthrough (Extended Data Fig. 1), genomes and annotations of each of the 
indicated species were as follows: E. coli Ensembl genome and annotations from 
assembly GCA_000967155.1.30, S. cerevisiae genome $288c (R57-1-1_20071212) 
and annotations”, C. elegans UCSC genome (WS190/ce6) and annotations!®, 
H. sapiens Ensembl genome release 83 and annotations from TargetScan v7.0"). 
Immunobloting. Animals were boiled in 1 x SDS loading buffer (65 mM 
Tris (pH 6.8), 10% glycerol, 2% SDS, 2mM PMSF, 1 x Halt Protease Inhibitor 
(Thermo), 10% 2-mercaptoethanol) and run on a 7.5% Criterion TGX gel 
(Bio-Rad Laboratories, Inc.). Protein was transferred to a low background fluores- 
cence PVDF membrane (Millipore). The membrane was blocked in 3% nonfat milk 
in 1 x PBST with 250mM NaCl. The 5-6 antibody was used at a 1:5000 dilution 
to detect myo-3, and 5-8 antibody used at a 1:5000 dilution to detect unc-54 
(ref. 52). The 5-6 and 5-8 monoclonal antibodies were produced previously by 
purification of endogenous myosin proteins. Secondary antibody staining was 
performed with 1:500 Cy3-conjugated affiniPure goat anti-mouse (Jackson 
Immunoresearch). Imaging was performed on a Typhoon Trio (Amersham 
Biosciences), and quantification performed in ImageJ. For the lower blot of 
Fig. 3d, lysates were made from multiple animals, and serial dilutions performed 
to titrate the number of animals per lane. 
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Extended Data Figure 1 | Distribution of C-terminal extensions upon 
stop codon readthrough. Annotations and genomes were as described in 
Supplementary Methods. Each 3’ UTR was translated starting one codon 
after the stop codon until the next in-frame stop codon. For metazoans, 
counting was performed in three different ways: including only genes for 
which exactly one 3’ UTR was annotated (blue), counting each annotated 
3’ UTR separately (green), or counting each gene once and splitting gene 
counts with multiple 3’ UTRs equally amongst the 3’ UTR isoforms (red). 
‘Nonstop’ denotes 3’ UTRs for which no stop codon was encountered 
before the poly(A) tail. For each species the distribution of next in-frame 
stop codons was calculated for 1,000 nucleotide shuffling of 3’ UTR 
sequences for genes with a single 3’ UTR annotated, and 95% confidence 
interval shown (yellow). A similar ‘randomized distribution was obtained 


Genes with at least one 3'UTR, one Count/Gene 
Genes with at least one 3'UTR, one Count/Txt 
Genes with one 3'UTR 

Nt Shufflings of Genes with one 3'UTR 


upon shuffling 3’ UTR sequences and preserving dinucleotide frequency. 
The frequency of stops immediately after the annotated stop codon 
(amino acid length 0) is highlighted with a blue arrow in each species. 
The distribution of peptide lengths follows an exponential decay curve, 
where the slope is related to the probability of encountering a stop codon 
at each position. In the simplest model, the probability of encountering a 
stop codon is constant throughout the 3’ UTR, accounting for the roughly 
linear shape of each plot (previously noted*”). Notable exceptions are 

a tendency towards second in-frame stops in E. coli (blue arrow), and a 
tendency towards peptides >60 amino acids in length in all species. In 

E. coli, the enrichment towards longer downstream peptides is at least 
partially explained by the operonic layout of genes. 
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Extended Data Figure 2 | Example quantification of the GFP:mCherry 
fluorescence ratios of images. Images were taken under a broad excitation 
and emission filter to allow for simultaneous capture of GFP and mCherry 
fluorescence. Intensities of each pixel in the red and green channels 

were extracted in python. Unfiltered pixel intensities are shown as black 
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dots. Pixels were filtered, background subtracted, and linear regression 
performed (red dots and line, see Methods). For simplicity, the green— 
red intensities from 1,000 random pixels are shown. The GFP:mCherry 
fluorescence ratio was taken as the slope of the linear regression line. 
10 x objective. 
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Extended Data Figure 3 | Readthrough regions confer a loss of 
superfolder GFP fluorescence. Each of the indicated TerByP regions 
were inserted downstream of superfolder (sf) GFP, upstream of the let-858 
3’ UTR. TerByP is the region after the annotated stop codon, up to and 
including the first in-frame stop codon in the 3’ UTR. Quantification was 


performed as described in Extended Data Fig. 2. 
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Extended Data Figure 4 | Explanation of ‘shuffle’ sequences. 
Trinucleotide codons from each TerByP region are colour-coded by gene 
(top). Codons were extracted and randomly shuffled in python. A codon 
was iteratively selected until a stop codon was encountered, defining 
shufflel. The process was repeated twice more to define shuffle2 and 
shuffle3. The resulting shuffle peptides are a combination of all three 
TerByP regions. Lengths and colour-coding of codons for shuffle1-3 
accurately reflect the sequences they are derived from. 
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Extended Data Figure 5 | RNA-seq and ribo-seq from unc-54 mutants. 
a-c, RNA-seq (a) and ribosome footprint profiling (ribo-seq) (b) library 
mRNA counts, with summary counts (c) for the indicated strains and 
mRNAs. Libraries were prepared from L4 animals, as described Methods. 
‘N2’ is wild type (PD1074, VC2010 (ref. 53)). unc-54(cc3389) bears a 
TAA (stop) to AAT (Asn) mutation, unc-54(TerByP). unc-54(e1301) 

has a GGA (Gly387) to AGA (Arg387) point mutation that confers a 
temperature-sensitive Unc phenotype with minimal discernible effects on 
UNC-54 protein levels. unc-54(e1301) was included as a control for the 
Unc phenotype of unc-54(cc3389), though e1301 confers a less severe Unc 
phenotype than cc3389. Values for unc-54 mRNA (blue) are highlighted 
throughout, and for comparison, three additional transcripts known to be 
at least partly expressed in the body-wall muscles are also highlighted: 
unc-87 (pink), unc-15 (green), and unc-22 (red). 
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Extended Data Figure 6 | Ribo-seq of unc-54(cc3389) shows an 95% confidence interval (CI) for all open reading frames in dashed lines. 
unexceptional progression of ribosomes in the readthrough region. d, The fraction of in-frame ribo-seq reads in the last 30 codons is plotted 
a, Raw ribo-seq reads for unc-54(+) (blue) and unc-54(cc3389) (green) as a function of read counts in the last 30 codons, and unc-54(cc3389) 
animals, plotted as read pile-ups. Mismatched bases are indicated with highlighted. e, The distribution of read lengths in the last 30 codons of 
black bars. Location of the normal stop codon and the first in-frame unc-54(cc3389), and all open reading frames (95% confidence interval, 
stop codon are indicated with “TAA and dotted lines. The extension in dashed lines). For b-d, reads were restricted to 28, 29, 30 nt lengths. For 
unc-54(cc3389) is 30 amino acids. b, The number of ribo-seq reads in b-e, a 12 nt offset was performed for the ribosomal P-site, and read counts 
the last 30 codons, compared to the previous 30 codons, for all mRNAs. were derived solely from the unc-54(cc3389) ribo-seq library. For c and 
Linear regression was performed on all points (solid line), and twofold e, a minimum 15 read counts was imposed to obtain the 95% confidence 
difference shown (dashed lines). c, The distribution of ribo-seq reads in interval from ‘all genes. 


the last 30 codons (90 nt) of unc-54(cc3389) is shown in green, and the 
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Extended Data Figure 7 | Lack of general conservation of coding 
potential downstream of stop codons in Caenorhabditis. Whole-genome 
alignment of six nematode species with C. elegans genome assembly ce10/ 
WS220 was obtained from the UCSC genome browser. For each annotated 
transcript, the aligned bases from the multiple species alignment were 
extracted and compared to the reference (C. elegans) genome. The left 
plot shows summary information of the alignment centred on annotated 
stop codons; the right plot shows the same centred on the first in-frame 
stop codon in 3’ UTRs. In red is the substitution frequency, that is, the 
number of mismatched bases divided by the number of aligned bases 

at a given position. The enrichment of ‘wobble’ position mutations is 
apparent as an increase in substitutions at the third position of each codon 
in the CDS. In green is the synonymous substitution frequency, that is, 

for codons beginning at a given position, the fraction of mutations that 
yield a synonymous substitution divided by all mutations at that position 
(synonymous and non-synonymous). The tendency to conserve amino 
acids in the CDS is apparent as a green spike at every in-frame codon. 

The change in substitution frequency and synonymous substitution 
frequency about the first in-frame stop codon (right plot) is due to a 
tendency for NTR codons to be conserved, and for AAN/AGN/GAN 
codons to not be conserved in 3’ UTRs, regardless of frame. 
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Extended Data Figure 8 | Nucleotide and amino acid composition of 
readthrough regions (C. elegans). CDSs and 3’ UTRs were analysed for 
various sequence properties. For simplicity, only genes and 3’ UTRs for 
which a single 3’ UTR was annotated were considered. Similar results were 
obtained with genes with multiple 3‘ UTRs. a, Nucleotide frequency of 
CDS, 3’ UTR, and TerByP (region between annotated stop codon and first 
in-frame stop codon). b, Frequency of amino acids in all three possible 
frames for the TerByP region. 3’ UTRs were translated one codon past the 
stop codon of the CDS until the next in-frame stop codon, with nonstop 

3’ UTRs ignored. Highlighted are codons with high G content (GGN, Gly) 
and high T content (TTY, Phe). c, TerByP regions tend to be hydrophobic, 
regardless of frame. Kyte—Doolittle score was used as a measure of 
hydrophobicity™*. To reduce noise, only TerByP regions at least 10 amino 
acids long were considered. P value is from Kolmogorov-Smirnoyv test 
comparing CDSs and TerByP sequences (each frame has P value < 10e-293 
for this comparison). As the TerByP sequences are shorter than CDSs on 
average, the distribution of TerByP hydrophobicity scores will tend to 
have higher variance than CDSs. Random portions of CDSs were taken, 
length-matched to TerByP frame zero peptide lengths. This was repeated 
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100 times, and the 95% confidence interval is shown (dashed lines, ‘CDS 
rands’). d, Hydrophobicity of the inserts is correlated with a negative 
effect on GFP fluorescence. The GFP:mCherry fluorescence ratio 

(Fig. 2b) was plotted against the maximum Kyte—Doolittle score in a six 
amino acid window for each insert. (Similar results were obtained using 
the Kyte-Doolittle score averaged across the entire sequence.) Mean 
(circle) and s.d. (bars) are shown. 3’-UTR-derived sequences are in blue, 
and non-3’-UTR-derived sequences are in red. To avoid redundancy or 
skewing of the data, in cases where multiple constructs were present with 
the same peptide sequence (for example, unc-54(TerByP), unc-54(TerByP, 
syn1), and unc-54(TerByP, syn2)), only the first of these was used. 

e, Hydrophobicity analysis of the TerByP extensions obtained by CRISPR- 
Cas9 engineering at the unc-22 and unc-54 loci. ‘+1/-1 TerByP’ denotes 
the gain or loss of a nucleotide, generating a late frameshift and allowing 
translation to proceed past the annotated stop codon out of frame with the 
upstream open reading frame. In each case, Kyte—-Doolittle hydropathy 
was used to analyse the C-terminal appendage. The least phenotypically 
affected strain of the three is shown in bold. 
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Extended Data Figure 9 | Nucleotide and amino acid composition of 
readthrough regions (H. sapiens). a, b, Similar analysis of hydrophobicity 
as in Extended Data Fig. 8c, d, performed in humans. 
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Extended Data Table 1 | Translation into 3’ UTRs at endogenous loci tends to yield hypomorphs 


Gene Genotype Phenotype Isolates 
pha-4 
pha-4(TerByP) wild type 2 
unc-22 
unc-22(TerByP) wild type 2 
unc-22(-1, TerByP) twitcher 1 
unc-22(+1, TerByP) twitcher 1 
unc-22(unc-22::GFP) wild type 3 
unc-45 
unc-45(TerByP) emb lethal 3 


unc-45(3xFLAG::TEV::3xHA) wild type 2 


tra-2 
tra-2(TerByP) XX males 2 
tra-2(3xFLAG) wild type 2 
tra-2(3xFLA XX males 2 
unc-54 
unc-54(TerByP) paralyzed 2 
unc-54(unc-54::gfp) wild type 2 
unc-54(+1, TerByP) paralyzed 1 
unc-54(-1, TerByP) weak Unc 1 


CRISPR-Cas9 editing!” was used to construct the mutations shown. See Supplementary Table 1 for precise nucleotide sequences of all strains. ‘1/+1 TerByP’ indicate the loss or gain of one nucle- 
otide relative to the zero frame, generating a frameshift over the stop codon, and translation into the 3’ UTR out of frame with the coding sequence. For unc-22, ‘wild type’ indicates a lack of twitching, 
even in 1 mM levamisole. For unc-54(—1,TerByP), ‘weak Unc’ animals were visibly slower than unc-54(+), but faster than unc-54(TerByP). All mutant phenotypes were recessive. 
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Cryo-EM structure of a human cytoplasmic 
actomyosin complex at near-atomic resolution 


Julian von der Ecken!, Sarah M. Heissler?+, Salma Pathan-Chhatbar2, Dietmar J. Manstein2* & Stefan Raunser 


The interaction of myosin with actin filaments is the central feature 
of muscle contraction’ and cargo movement along actin filaments 
of the cytoskeleton”. The energy for these movements is generated 
during a complex mechanochemical reaction cycle**. Crystal 
structures of myosin in different states have provided important 
structural insights into the myosin motor cycle when myosin is 
detached from F-actin®~’. The difficulty of obtaining diffracting 
crystals, however, has prevented structure determination by 
crystallography of actomyosin complexes. Thus, although structural 
models exist of F-actin in complex with various myosins*"', a high- 
resolution structure of the F-actin-myosin complex is missing. 
Here, using electron cryomicroscopy, we present the structure of a 
human rigor actomyosin complex at an average resolution of 3.9 A. 
The structure reveals details of the actomyosin interface, which 
is mainly stabilized by hydrophobic interactions. The negatively 
charged amino (N) terminus of actin interacts with a conserved 
basic motif in loop 2 of myosin, promoting cleft closure in myosin. 
Surprisingly, the overall structure of myosin is similar to rigor-like 
myosin structures in the absence of F-actin, indicating that F-actin 
binding induces only minimal conformational changes in myosin. 


1 


A comparison with pre-powerstroke and intermediate (P;-release)’ 
states of myosin allows us to discuss the general mechanism of 
myosin binding to F-actin. Our results serve as a strong foundation 
for the molecular understanding of cytoskeletal diseases, such as 
autosomal dominant hearing loss and diseases affecting skeletal and 
cardiac muscles, in particular nemaline myopathy and hypertrophic 
cardiomyopathy. 
Using electron cryomicroscopy (cryo-EM) and single-particle-based 
analysis of helical specimens (Methods), we determined the structure 
a human actomyosin-tropomyosin (ATM) complex, composed of the 
motor domain of non-muscular myosin-2C (NM-2C), cytoplasmic 
\1-F-actin and cytoplasmic tropomyosin 3.1 (Fig. 1, Extended Data 
Figs 1a-g and 2 and Supplementary Video 1). We also reprocessed our 
previous F-actin-tropomyosin data set!” and obtained an improved 
reconstruction at 3.6 A resolution (Extended Data Figs 1h-k and 2a). 
The density of tropomyosin did not improve in both data sets and is 
limited to ~7 A as described previously”. 

The ATM structure reveals that myosin interacts intimately with 
F-actin (Fig. 1a). The overall organization of the ATM complex is 
similar to that described in our previous structure of the complex 


Figure 1 | Structure of the ATM 
complex. a, Cryo-EM reconstruction 
of F-actin (five central subunits in 
green and cyan) decorated with 
tropomyosin (blue) and myosin 
(central molecules in red). The 
peripheral densities (shown in 

grey) and tropomyosin were low- 
pass filtered and symmetrized for 
better visualization. b, Subdomain 
organization of F-actin and myosin 
head region. The closed actin- 
binding cleft between L50 (red) and 
U50 (dark red) domains is indicated 
with a dotted line. c, d, Front and 
back views of the F-actin-myosin 
interface. Involved structural parts of 
myosin are highlighted in red. For all 
figures and videos, we use a general 
colour code for each protein and 
state, if not labelled differently. The 
central F-actin subunit is shown in 
cyan (M-state) and yellow (A-state); 
surrounding F-actin subunits are 
depicted in green (M-state) and in 
darker yellow (A-state); rigor state 
myosin (red), Pj-release state myosin 
(blue) and PPS state myosin (purple). 
Less relevant parts of models or 
densities in respective figures are 
depicted in grey or faded out. 


1Department of Structural Biochemistry, Max Planck Institute of Molecular Physiology, 44227 Dortmund, Germany. Institute for Biophysical Chemistry, Hannover Medical School, 30625 
Hannover, Germany. *Division for Structural Analysis, Hannover Medical School, 30625 Hannover, Germany. +Current address: Laboratory of Molecular Physiology, National Heart, Lung, and 


Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA. 
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between Dictyostelium discoideum myosin-IE, skeletal a-F-actin and 
a-tropomyosin!” (Supplementary Videos 2 and 3). However, given the 
superior resolution of our present structure, we could clearly identify 
large side chains and therefore reveal most intermolecular interactions 
between F-actin and myosin in detail. 

The large interface between the two proteins is formed mainly 
between the helix-loop-helix (HLH) motif and surface loops of myosin 
(CM-loop, loop 2, loop 3, loop 4, and ‘activation’ loop!) and the 
subdomain (SD) 1 and 2 of one actin subunit and SD2 (D-loop) of 
the adjacent actin subunit as previously predicted®""! (Fig. lb-d and 
Supplementary Video 3). 

Tpm3.1, which is resolved to ~7 A, is in the same position as skeletal 
muscle tropomyosin 1.1 (ITpm1.1) in our previous structure, namely 
the M-state’®, interacting with loop 4 of myosin and SD3 of actin 
(Figs 1b, c and 2a). Interestingly, although Tpm3.1 is shorter (stretching 
over six actin subunits) than Tpm1.1 (stretching over seven actin subu- 
nits), the pitch of the coiled-coil structure is equivalent. Tpm3.1, which 
is mainly negatively charged on its surface, interacts with arginine 384 
of loop 4, indicating electrostatic interactions (Fig. 2a). In addition, the 
negatively charged residue D387 on loop 4 (N377-D393) interacts with 
a positively charged region on F-actin (K325, K327) (Fig. 2b). Notably, 
in the absence of myosin, these actin residues directly interact with 
tropomyosin (A-state)!. 

The HLH motif (L550-E575) in the lower 50-kDa (L50) domain of 
NM-2C plays an essential role in strong binding of myosin to F-actin 
(Figs 1c, d and 2c, d). It enters a hydrophobic groove on actin that is 
formed between two adjacent actin subunits comprising SD1 and SD3 
of one and the D-loop (R38-V53) of the adjacent subunit. In particular, 
the hydrophobic loop of the HLH motif interacts with the hydro- 
phobic groove and F560 is completely immersed into a hydrophobic 
cavity resembling a lock-and-key interaction (Fig. 2d, Extended Data 
Fig. 3a—d and Supplementary Video 4). The key role of F560 for the 
actomyosin interaction has also been shown by mutational analyses 
in which a F560A mutation resulted in a complete disruption of 
motility, whereas alanine mutants of the directly adjacent W559 and 
P561 showed only one-tenth the motility compared with wild type". 
Interestingly, compared with its position in the pre-powerstroke (PPS) 
state the loop is the only part of the HLH motif that alters its position 
upon actin binding, stressing its important role in the actin-myosin 
interaction (Extended Data Fig. 3a, c). 

In addition to the hydrophobic contacts, there are also two elec- 
trostatic interactions that stabilize the HLH motif binding to F-actin. 
E570 probably forms a salt bridge with K49 of the D-loop and E556 
interacts with the backbone of $349 and T350 in SD1 of F-actin 
(Fig. 2c, Extended Data Fig. 3b, e and Supplementary Video 4). 
Both residues are part of a highly conserved acidic patch in several 
myosin classes (Extended Data Fig. 3e, f), and a single point muta- 
tion (E556Q) in myosin results in a tenfold reduced F-actin binding 
affinity’. 

The cardiomyopathy loop (CM-loop), forming one antiparallel 
B-strand pair (T417-T432), is the major site of the myosin upper 
50-kDa (U50) domain that interacts with actin (Figs lc, and 2e, f 
and Supplementary Video 4). The CM-loop is fully ordered and the 
interface with actin is mainly stabilized by hydrophobic interactions 
(Fig. 2f), supported by weak electrostatic interactions at the tip and the 
base of the CM-loop (Fig. 2e). K429, which is found in all myosin-I] 
isoforms (Extended Data Fig. 4a, b), interacts with a negatively charged 
patch on F-actin formed mainly by D24 and E333 in SD1 and SD3, 
respectively (Fig. 2e). In addition, the positively charged tip (R424) 
interacts with the negatively charged region around E55 and D92 
of F-actin (Fig. 2e). However, residue 424 is only positively charged 
in smooth and non-muscular isoforms of myosin (Extended Data 
Fig. 4b) and therefore does not play a role in skeletal and cardiac muscles. 
Importantly, we did not find any prominent possible salt bridges 
that would stabilize the contacts between the CM-loop and F-actin, 
supporting previous mutagenesis studies suggesting that charged 
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Figure 2 | Interfaces of the ATM complex. a, b, Interface of loop 4 

with tropomyosin (blue) and SD3 of one F-actin subunit (cyan). 

c, d, Interaction of the HLH motif of myosin (red) with hydrophobic 
groove formed by the D-loop of one F-actin subunit (green ribbon in 

c and as surface in d) and SD1 and SD3 of the adjacent subunit (surfaces 
depicted by low (white) to high (yellow) hydrophobicity). Hydrophobic 
residues at the interface (c, d) and a possible electrostatic interaction 
(dotted lines (c)) are highlighted. e, f, The CM-loop (red) binds to a 
region formed by SD1 and SD3 of F-actin. Charged (e) and hydrophobic 
(f) residues of myosin. g, Loop 3 (red) interacts with SD1 of an adjacent 
F-actin subunit forming the Milligan contact?”!. The F-actin surface is 
coloured either by hydrophobicity (f) or electrostatic Coulomb potential 
from —10kcal mol”! (red) to +10 kcal mol“! (blue) (e, g). In all panels, 
coloured residue labels depict F-actin residues. 


residues, in particular the highly conserved residue D425, play a minor 
role in this interface'*’®. 

As speculated previously’, the highly conserved and disease-related 
residue R419 (R403 in }-cardiac myosin) indeed does not directly interact 
with F-actin (Fig. 2e, Extended Data Fig. 4b-d and Supplementary 
Video 4). In our structure, R419 clearly interacts with Y426 on the 
opposing strand of the CM-loop, thereby bridging and stabilizing 
the conformation of the loop (Fig. 2e). Both residues are highly con- 
served, suggesting that this bridge is present in all myosin-I] isoforms 
(Extended Data Fig. 4b). Many disease-causing mutations are found in 
the region of the CM-loop, demonstrating the high importance of the 
CM-loop for the strong binding between actin and myosin (Extended 
Data Fig. 4b-e). 

Several studies showed that loop 2, connecting the L50 and U50 
domains in myosin (W638-T669), plays a major role in the initial 
binding to F-actin'*!!7-!9. We see clear density for loop 2 in the ATM 
structure. However, whereas the base of the loop is ordered, the rest of 
the loop is more flexible (Fig. 3a, b). It occupies a large predominantly 
hydrophobic surface of the actin SD1 domain (Fig. 3c and Extended 
Data Fig. 5a, b). 

A conserved hydrophobic patch at the carboxy (C)-terminal base of 
loop 2 (G665-F667) and the tip of helix-R (W559) (Fig. 3c, Extended 
Data Fig. 3fand Supplementary Video 4) interacts with a hydrophobic 
groove of actin SD1 (Fig. 3c and Extended Data Fig. 3d). Mutagenesis 
studies showed that especially W559 is essential for forming the 
F-actin-myosin interface and obtaining motility'*’°. Compared with 
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Figure 3 | Stabilization of loop 2. a, Myosin densities of lower resolution 
(grey) are shown together with more highly resolved regions of myosin 
(red) and F-actin (cyan). The densities can be assigned to more flexible 
parts of loop 1, loop 2 and the outer lever arm (Methods), respectively. 

b, Close-up view onto region of the density map corresponding to loop 2. 
Loop 3 of the adjacent myosin does not interact with loop 2 (red). 

c, d, Interaction of the stabilized base of loop 2 with SD1 of F-actin 
(coloured by hydrophobicity (c) or by electrostatic Coulomb potential 
(d)). The flexible part of loop 2 and possible electrostatic interactions are 
indicated by different dotted lines. 


their position in the PPS state, these residues orient towards actin to 
stabilize the newly formed interface (Fig. 4a). 

Interestingly, the adjacent conserved positively charged region 
(R661—R664) interacts with the acidic N terminus of actin anda 
negatively charged area (D23, D24) of SD1 (Fig. 3d, Extended Data 
Fig. 5b and Supplementary Video 4). While one of the conserved 
arginines (R663) forms a possible salt bridge with actin E3 (Fig. 3d), 
the other one (R664) interacts with D640 and D642 at the other end 
of loop 2, forming an electrostatic belt that stabilizes the base of loop 2 
(Fig. 3c). Notably, D640 and D642 are only conserved in myosins with 
along loop 2 (Extended Data Fig. 5c), suggesting that the electrostatic 
belt is not required for myosins with a shorter loop 2. 

In line with our previous observation!®, loop 3 (Q576-D593) forms 
the so-called Milligan-contact””! connecting the L50 domain to SD1 
of the adjacent actin subunit (Figs 1c and 2g and Extended Data 
Fig. 5d, e). The small interface is only formed by complementary 
charged surfaces and not by specific salt bridges as previously expected 
(Fig. 2g). The weak nature of the interactions, and the fact that not all 
myosin isoforms have a long loop 3 that can form this contact”, suggest 
that, for most myosin proteins, loop 3 plays only an ancillary role in 
strong F-actin binding. 

Several studies suggested that a proline-rich loop in the L50 domain, 
a so-called activation loop (1541-G549), is directly involved in activa- 
tion of myosin by interacting with the negatively charged N terminus 
of actin'?°. Our ATM structure confirms that this loop is part of the 
actomyosin interface. Together with helix-W and the base of loop 2, 
it forms a positively charged basin that interacts with the negatively 
charged N terminus of actin (Fig. 3d, Extended Data Fig. 6a, b and 
Supplementary Video 4). However, R543, which is the only positively 
charged residue of the loop and conserved in myosin-II (Extended Data 
Fig. 6c), points away from the interface and therefore cannot be involved 
in a direct interaction with the glutamates of the actin N terminus 
(Fig. 4b and Extended Data Fig. 6d). In other myosins with shorter 
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Figure 4 | Comparison of PPS and rigor state and induced changes in 
F-actin. a, Involved residues of loop 2 and helix-R undergo changes in 
rotamer orientation during base stabilization of loop 2 (PPS state (purple), 
PDB accession number 514E; rigor state (red)). The F-actin surface is 
shown in grey. Inset shows a top view of rotating residues of loop 2. 
F667 rotates to SD1, R668 rotates outwards to the U50 of myosin. 

b, Density comparison of the N terminus between myosin-unbound 
F-actin (yellow, A-state) and myosin-bound F-actin (cyan, M-state) 
illustrates the pulled conformation of the terminus induced by loop 2 
interaction. The flexible N terminus (1-4 in «-actin) in the A-state is 
depicted as dotted lines. c, Superposition of SD1 and SD2 of F-actin in 
the A-state (yellow) and M-state (cyan) visualizing the myosin-induced 
changes in F-actin (indicated by arrows). 


activation loops, namely myosin-V, myosin-VI or Dictyostelium 
myosin-I, the position of the positively charged residue is shifted 
by one position and would therefore allow a direct interaction with 
the N terminus (Extended Data Fig. 6c, e, f). Compared with the PPS 
conformation, the proline-rich loop orients slightly closer to F-actin 
(Extended Data Fig. 6b). Since the actin-induced conformational 
changes in the proline-rich loop and the adjacent relay helix are minor, 
we conclude that they are probably not responsible for a direct activa- 
tion of myosin and therefore suggest using the term supporting loop 
rather than activation loop. 

To identify myosin-induced conformational changes in F-actin, we 
compared the ATM structure with our reprocessed F-actin-tropomyosin 
structure. As expected from our previous observations"®, the overall 
changes are minimal. Whereas the D-loop and other interface regions 
orient slightly towards myosin (Fig. 4c and Extended Data Fig. 3a), 
areas interacting with the CM-loop and more distal regions of actin 
only move marginally away from the interface (Extended Data 
Fig. 7a-c). 

The most prominent changes occur at the N and C termini of actin 
(Fig. 4b, c and Extended Data Fig. 7d-k). The highly conserved and 
negatively charged N terminus (Extended Data Fig. 7c), which is only 
partly resolved in F-actin!”, but essential for myosin binding***, is 
completely ordered in the ATM structure and pulled into a positively 
charged basin on the myosin structure (Fig. 4b, c and Extended Data 
Fig. 6a). Two glutamates at its tip (E2, E3) are potentially involved in 
salt bridges with positively charged residues on helix-W and loop 2 
(Fig. 3d). The conformational change of the N terminus is partly trans- 
mitted to the nucleotide-binding pocket (Extended Data Fig. 7e, f). 
Actin residues D10 and K17 slightly change their position (Extended 
Data Fig. 7f, g); however, this does not considerably alter the position 
of ADP and Mg?* (Extended Data Fig. 7f, h). 
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Figure 5 | Model of myosin binding to F-actin. a—c, Cartoon 
representation of the myosin-F-actin-binding mechanism. Numbers refer 
to different steps of the actomyosin interaction and are directly described 
in the lower panel. U50 and L50 domains are coloured according to their 
respective myosin state: PPS (purple, PDB accession number 5I4E), 
P;-release (blue, homology model of PDB accession number 4PFO (ref. 7)) 


Interestingly, although the actin C terminus does not participate in 
forming the actomyosin interface, it is completely ordered in the ATM 
structure and orients towards the SD1 domain (Fig. 4c and Extended 
Data Fig. 7i-k). This is in line with previous studies showing that myosin 
binding to F-actin results in quenching of fluorescence of pyrene- 
labelled actin at C373 (C374 in a-actin)’, The higher myosin-induced 
stability of the D-loop, N terminus and surrounding regions is prob- 
ably responsible for the stabilization of the C terminus, which is not 
well ordered in the F-actin-tropomyosin structure’? (Extended Data 
Fig. 7j). We believe that the minimal but substantial myosin-induced 
conformational changes are exemplary for most actin-myosin interac- 
tions. We therefore suggest using the term M-state for F-actin bound to 
myosin in contrast to A-state for bare F-actin (Figs la and 4c). 

To understand which conformational changes F-actin binding 
induces in myosin, we compared our rigor ATM structure with 
rigor-like crystal structures (Extended Data Fig. 8). Surprisingly, the 
overall structures of the different rigor-like states are very similar com- 
pared with that of the rigor state (Extended Data Fig. 8a, b), indicating 
that F-actin induces only minimal conformational changes in myosin 
as previously predicted''”’. Differences are found at actin-interacting 
loops that are partly ordered in the crystal structures and are stabilized 
by actin in the ATM structure (Extended Data Fig. 8c). In addition, we 
found that the converter and lever arm regions differ in their position 
relative to the rest of the protein (Extended Data Fig. 8d). Importantly, 
the comparison between rigor and rigor-like structures shows that 
F-actin stabilizes the closed conformation of myosin, but does not 
induce major additional conformational changes. 

To gain further insight into how actin accommodates first a weak and 
then a strong myosin-binding state, we compared our rigor NM-2C 
structure with the crystal structure of the same protein in the PPS state. 
On the basis of recently determined crystal structures of the motor 
domain of myosin-VI in the P;-release state, an important intermediate 
state, a detailed mechanism of myosin binding to F-actin has been 
proposed’. To obtain the best possible approximation for an F-actin— 
NM-2C complex in the P;-release state, we used a two-step approach. 
First, we calculated a homology model of the Pj-release state of NM-2C 
on the basis of the atomic model of myosin-VI (ref. 7). On the basis 
of the positions of actin and myosin in our rigor actomyosin structure 
and the myosin PPS structure, we then performed different alignments 
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and rigor state (red). Other domains are coloured as in Fig. 1b. F-actin 
subunit colours give their current state (A-state, yellow; M-state, cyan and 
green). Loop 2 (grey) and strut (dark blue) are shown as lines. Flexibility in 
loop 2 is indicated by dotted lines. Arrows highlight rotations and binding 
of domains. 


of the Pj-release state model to obtain a model for the actomyosin 
complex in its P;-release state (Fig. 5 and Extended Data Fig. 9a-c). 
Exclusive alignment to the U50 domain in all states would cause steric 
clashes with F-actin in the P;-release state (Extended Data Fig. 9a). 
Alignment to the L50 domain would result in a rotation of the U50 
away from F-actin before the final strong binding (Extended Data 
Fig. 9b). We therefore chose a combined alignment instead (Extended 
Data Fig. 9c and Supplementary Video 5) to describe the possible 
global conformational changes during myosin binding (Extended Data 
Fig. 9d-f, Fig. 5 and Supplementary Video 6). We aligned the Pj-release 
state model first to the L50 domain of the rigor state. The PPS state was 
then aligned to the U50 domain of the Pj-release state. 

There is general agreement that loop 2 initiates weak binding of 
myosin-ADP-P; to F-actin by interacting with the SD1 and SD3 of one 
actin subunit'*!>’7-!? (Extended Data Fig. 9d). This brings both the L50 
and U50 close to F-actin (Fig. 5a). As proposed in ref. 7, the L50 domain 
rotates and binds to actin, resulting in the Pj-release state that repre- 
sents the initial strong binding state of myosin. The interface is mainly 
mediated by hydrophobic interactions of the HLH domain with SD1, 
SD3 and the D-loop and hydrophilic interactions with the N terminus 
of actin, in line with our previous prediction!® (Fig. 5b). On the basis 
of our results, we propose that this process stabilizes the base of loop 2 
(Extended Data Fig. 9e, Fig. 5b and Supplementary Video 6), creating 
a positively charged patch to which the negatively charged strut (bridge 
between L50 and U50) is attracted (Extended Data Fig. 9f and Fig. 5b). 
Thus, the base of loop 2 acts as a key region that shifts the equilibrium 
between the open and closed actin-binding cleft”* towards the closed 
conformation, in which F-actin directly interacts with the CM-loop 
and loop 4 of the U50 domain (Fig. 5c). Notably, both the strut and 
the highly conserved basic region at the base of loop 2 have previously 
been shown to be essential for strong binding'”?””’. In our model, the 
closure of the cleft is mediated by a rotation of the U50 domain towards 
F-actin and not by a back-rotation of the L50 domain (Fig. 5c). 

As described before (reviewed in ref. 4), cleft closure results in the 
strong binding of myosin to actin, providing the necessary anchoring 
of the motor domain for the subsequent powerstroke (Fig. 5c). The 
time point and effect of P; and ADP release during this process is highly 
debated”°. Because we lack high-resolution structures of intermediate 
states of myosin bound to F-actin, we cannot determine whether P; is 
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released before or after the powerstroke. However, our observation that 
the rigor state is very similar to the rigor-like state suggests that actin 
promotes and stabilizes the closed conformation of myosin, ultimately 
resulting in the release of phosphate and ADP. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and purification. G-actin (yl-actin, ACTG1 from Homo 
sapiens) was recombinantly expressed using the baculovirus/Sf9-cell system and 
purified as described previously*!**. Afterwards, the sample was polymerized to 
F-actin by increasing the salt concentration to 100 mM KCl and 2mM MgC). 
Tropomyosin 3.1 (TPM3 from H. sapiens, isoform Tpm3.1) was expressed and 
purified from Escherichia coli on the basis of the protocol of ref. 33 with no addi- 
tional modifications. The motor domain of non-muscular myosin-2C (MYH14, 
isoform 2 from H. sapiens) consisting of amino acids 1-799 was directly fused to 
an artificial lever arm (spectrin repeats 1 and 2 from a-actinin)** and a C-terminal 
Flag-tag. The protein was recombinantly overproduced in the Sf9/baculovirus 
system as previously described™ and purified via Flag capture and size-exclusion 
chromatography on a Superdex 26/60-200 prep grade column. Before grid prepara- 
tion for electron microscopy studies, the F-actin sample was spun down (100,000g) 
and carefully suspended in nucleotide-free F-actin buffer (5 mM Tris-HCl pH 
7.5, 1mM DTT, 100mM KCl, and 2mM MgCl;). The actomyosin complex was 
prepared by mixing F-actin with tropomyosin initially at a molar ratio of 7:1. The 
final concentration of tropomyosin for frozen specimens was then adjusted empir- 
ically to obtain complete decoration and only little unbound tropomyosin in the 
background by standard negatively stained studies as described previously". The 
F-actin-tropomyosin filaments were decorated with myosin during preparation 
of vitrified sample grids (see below). 

Optimized grid preparation and image acquisition for cryo-EM. Best condi- 
tions for reconstituting the F-actin-tropomyosin complex were screened by using 
the negatively staining protocol as described above. When myosin was incubated 
in solution with the optimized F-actin-tropomyosin sample before applying the 
sample to the grid, we always obtained only bundles of fully myosin-decorated 
actin filaments. Because of this, we optimized the protocol to reconstitute 
the full actomyosin filaments with reduced bundling on a grid. Therefore, the 
normal cryo-preparation protocol was changed. First we applied 211 of F-actin- 
tropomyosin solution to a glow-discharged holey carbon grid (C-flats 2/1, 
Protochips), incubated for 20s and manually blotted from the backside for less 
than a second with filter paper. A thin layer of solution stayed on the grid and 
the filaments where pre-straightened in the holes. Afterwards, 1.511 of myosin 
solution (31M in F-actin buffer without nucleotide) were added directly on the 
grid, incubated for 10s and then manually blotted for 5s from the backside with 
filter paper (Whatman no. 5), before vitrification by plunging the grid into liquid 
ethane using a Cp3 plunger (Gatan). 

Screening for the best sample and blotting conditions was performed on a JEOL 

JEM 3200FSC electron microscope equipped with a field emission gun and oper- 
ated at a voltage of 200 kV. The omega in-column energy filter of the microscope 
was used to estimate best ice conditions (~70-100 nm thickness). Finally, a data 
set was taken with a spherical aberration-corrected FEI Titan Krios transmission 
electron microscope equipped with an extra-high brightness field emission gun 
(X-FEG) and operated at a voltage of 300 kV. Although the sample preparation pro- 
tocol was optimized, we had to screen and choose usable grid squares extensively. 
Images were recorded with a back-thinned 4k x 4k FEI Falcon 2 direct detection 
camera under minimal dose conditions using the automatic data collection 
software EPU (FEI). Within each selected grid hole, three different positions were 
imaged, each with a total exposure of 1s and a frame recording time of 55 ms. Seven 
frames from 85 to 475 ms with a total dose of ~16 electrons per square angstr6m 
and one total average (integrated image) with an electron dose of ~35 electrons 
per square angstrém were used for image processing. The used magnification of 
125,000 (nominal magnification of 59,000) corresponds to a pixel size of 1.1A. The 
defocus range of the data set was 0.7-2.8 1m (Extended Data Table 1). 
Image processing of the cryo-EM data set. In total, we collected ~6,300 images 
in two sessions. Despite the extensive pre-screening of grid squares before starting 
automatic data collection (see above), we deleted ~68% of the recorded images 
because of bundled filaments, contaminations or bad ice quality. Resulting frames 
were aligned and afterwards summed up using motion correction*’. The drift- 
corrected averages were used for determination of defocus and astigmatism values 
with CTFFIND3*°. Filaments were manually selected (Extended Data Fig. 1a) and 
exported from the 1s integrated images using sxhelixboxer in SPARX* without 
changing the orientation of the filaments to the y axis. A total of ~138,000 seg- 
ments were extracted with a box size of 256 pixels and a boxing distance of 29 pixels 
(overlap ~90%). Thus, the approximate distance between them (~32 A) slightly 
exceeded the rise of the helical assembly of actin (~27-28 A). The same procedure 
was applied to the drift-corrected average and all individual frames. Afterwards all 
segments were transformed to RELION**-readable image stack formats and initial 
metadata files were created for further refinement steps in RELION. 


First, two-dimensional reference-free classification and sorting of bad 
classes from the integrated images led to a resulting data set of 126,000 particles 
(Extended Data Fig. 1b). The resulting data set was used in several rounds of three- 
dimensional auto-refinements with particles from the integrated images and local 
three-dimensional auto-refinements with the particles from the drift-corrected 
averages. The refinement showed an expected Gaussian distribution of projection 
direction around the filament axis (Extended Data Fig. 1d, e). To improve processing 
time, we limited the tilt angle afterwards (Extended Data Fig. 1d). Finally, we 
applied a particle-based movie refinement and frame weighting” and continued 
three-dimensional auto-refinements with the resulting contrast-enhanced 
particles. We did not make use of helical symmetry during refinement but masked 
F-actin and myosin in the outer regions to focus the refinement on the central 
parts. Initially, we applied a standard spherical mask (diameter 270 A, Extended 
Data Fig. 1d) to the reconstruction. After a global refinement with the spherical 
mask, we continued with local refinement and a mask at the size of seven F-actin 
subunits and six myosin molecules. We used a preliminary model and the ‘Colour 
zone and ‘Split Map’ options in CHIMERA**"' to extract the central part of the 
map from the current density map. From this part of the map we calculated a 
smoothed mask with the ‘relion_mask_create function in RELION. To evaluate 
the results, we again performed a two-dimensional classification with projection 
parameters derived from three-dimensional refinement (Extended Data Fig. 1c). 
In single class averages, we could already detect secondary structure elements 
(Extended Data Fig. 1f, g). As a final sorting step, we deleted particles, which 
were outliers with respect to their neighbouring segments coming from the same 
filament. Therefore, we kept track throughout the whole processing steps to know 
to which filament each particle belongs. This resulted in 118,000 particles, which 
we used for a local three-dimensional auto-refinement. In the final iteration, we 
applied a mask of the central five F-actin subunits and two central myosin mole- 
cules (coloured region in Fig. 1a). This mask was created as described above for 
the mask of seven F-actin subunits and six myosin molecules. Outer regions of the 
lever arm were also excluded. 

Fourier shell correlation (FSC) analysis was performed within the central area 
(five F-actin subunits and two myosin molecules) of the volume, resulting in an 
average resolution of 3.9 A (FSCo,143 criterion”) for the F-actin-myosin electron 
density map (Extended Data Fig. 2a). The density map of F-actin-myosin was then 
sharpened using a negative b factor of -200 A? and filtered to its nominal resolu- 
tion. Because tropomyosin was masked out during refinements as we did before!’, 
we filtered the tropomyosin density map to ~7 A and merged it with the final 
F-actin-myosin map to obtain a map of the entire F-actin-myosin-tropomyosin 
complex. 

Local resolution was estimated using ResMap* on the full density map without 
masking (Extended Data Fig. 2b), revealing a resolution gradient from higher 
(~3.5 A, F-actin core) to lower (4.5 A - 5A, outer myosin domains) values, which 
could be a result of induced forces from the protruded lever arm (Extended Data 
Fig. 2c-e). We converted the local resolution map to absolute frequencies and 
applied a local filtering algorithm on the final map with sxfilterlocal in SPARX. To 
estimate and confirm the helical symmetry of the actomyosin complex, we used 
the symmetry search function of sxhelicon_utils in SPARX. 

We reprocessed our previous F-actin-tropomyosin!” data set with the same 
protocol as described above (Extended Data Fig. 1h-k) and obtained an improved 
reconstruction at an average resolution of 3.6 A for the central five F-actin subu- 
nits (Extended Data Fig. 2a). During the helical reconstruction approach, several 
asymmetrical units are averaged by symmetrization. This results in a decrease of 
resolution in flexible and bent regions. However, using the single particle approach, 
we refine only on the central region, thereby probably decreasing the influence 
of flexibility induced by the bending of the filament. This probably improves the 
resolution of the resulting reconstruction. 

For each data set, we provide two density maps in the Electron Microscopy 
Data Bank (EMDB) databank. One entry contains the map after ‘post-processing’ 
in RELION without masking, the other one the same map locally filtered, masked 
and merged with the filtered tropomyosin density map at its respective position 
(as described above). 

Atomic model building and refinement. The central F-actin subunit (chain A) 
and the central myosin molecules (chain F, G) show all available contacts to adja- 
cent chains (B, C, D, E) and were therefore used for further structural analysis. 
We used our previous F-actin model” as a starting model for F-actin and the 
highly homologous model of rigor-like NM-2B (PDB accession number 4PD3“*) 
for myosin. We performed homology modelling with the respective sequences of 
both proteins using MODELLER®. Derived models were rigid-body fitted into 
the density map, using ‘Fit in Map’ and the map was segmented with ‘Split Map’ 
in CHIMERA*°*!, Additionally, models were flexibly fitted with iMODFIT** 
into the respective density maps. Finally, the flexibly fitted models were used to 
create a starting F-actin-myosin model. The electron density was converted to 
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structure factors with the CCP4 program suite’”. The model and the density map 
were then used for real space refinement and model building in COOT™. Flexible 
parts of the N terminus (residues 1-49), loop 1 (residues 222-232), loop 2 (residues 
640-661) and lever arm (residues 817-1039) of myosin were deleted from the model. 
Some parts (especially surface loops and termini) of F-actin and myosin were built 
de novo. Domains showing resolution lower than 4.5 A (outer myosin regions) were 
not manually changed and only the backbone trace was optimized. 

The resulting model was refined in REFMAC, using the modified version of 
the program for cryo-EM maps” and applying secondary structure and reference 
restraints derived by ProSMART™. In addition, we added non-crystallographic 
symmetry restraints for F-actin (chain A-E) and myosin (chain F and G). To prevent 
overfitting, we first determined refinement settings by refining the model (atoms 
randomly displaced by 0.5 A) only versus a density map belonging to one half 
of the data set and compared the FSC curves of the refined model with both 
half maps (Extended Data Fig. 2f). Finally, the model was refined versus the full 
map with the derived refinement parameters, which did not show significant 
differences in the two FSC curves (Extended Data Fig. 2f). We used 
MOLPROBITY*" to evaluate the resulting atomic model. The data statistics are 
given in Extended Data Table 1. 

For the reprocessed F-actin-tropomyosin data set, we used the same refine- 

ment strategies as for the actomyosin data set. The final data statistics are given in 
Extended Data Table 1. 
Fit of the tropomyosin models. To describe the interaction between F-actin-myosin 
and tropomyosin, we used the tropomyosin model from our previous structure!”. 
As already described in the main text, the pitch of the coiled-coil structure is 
equivalent. We directly rigid-body fitted the tropomyosin model (PDB acces- 
sion number 4A7F) into the density map, using ‘Fit in Map’ in CHIMERA”. We 
reduced the model to the five central pseudo-repeats as before! and shifted the 
residue numbering regarding the differences from long Tpm1.1 to the shorter 
Tpm3.1. Owing to the limited resolution of the cryo-EM density in the region of 
tropomyosin, we avoided interpretation of tropomyosin at the single amino-acid 
level. 

For the reprocessed F-actin-tropomyosin model, we could use the tropomyosin 

model from our previous model (PDB accession number 3JA8), as the structure 
and resolution in that region did not differ. 
Structure analysis and visualization. For visualization of models and density 
maps in all figures and videos, we used CHIMERA”. The actomyosin complex was 
protonated using H++ (ref. 52) at pH 7.5, and the electrostatic Coulomb potential 
of the filament surface was calculated ranging from —10 to +10kcal mol’. For vis- 
ualization of the hydrophobicity per amino-acid residue, we used ‘Define attribute’ 
in CHIMERA and generated amino-acid-specific scores®. The densities of 
flexible parts of myosin were detected by using the ‘Colour Zone’ function on a 
low-pass-filtered density map in CHIMERA“!. For comparison of differences in 
density of A-state and M-state F-actin, we low-pass filtered both density maps 
to the same resolution of 3.9. Sequence alignments were performed using 
the ClustalOmega online server™*. The HGMD* library and UNIPROT”? were 
browsed to find mutations in regions of interest. For creating a homology model of 
NM-2C in the P;-release state, we used MODELLER in the CHIMERA “Multalign 
viewer ’-interface with NM-2C as target sequence and the coordinates of the 
previously determined crystal structure of the P;-release state of myosin-VI (PDB 
accession number 4PFO)’ as reference structure. 
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Extended Data Figure 1 | Micrographs, two-dimensional classifications _ of projection direction of each boxed segment relative to the three- 
and three-dimensional refinement. a—c, Representative of ~2,000 digital | dimensional reconstruction (grey). f, Example of two class averages out 


micrographs (a) and of 200 two-dimensional class averages of the of c that show secondary structure elements. g, Fit of F-actin-myosin 
F-actin-tropomyosin-myosin data set before (b) and after (c) model to assign characteristic domains of myosin (see coloured circles 
three-dimensional refinement, respectively. Lower part of the micrograph and boxes). h, i, Representative of 200 class averages of the reprocessed 
is band-pass filtered to allow a better visualization of the filaments. F-actin-tropomyosin data set before (h) and after (i) three-dimensional 
Only filaments in rectangular boxes were chosen for refinement and refinement. j, k, In class averages, secondary structure elements in 
bundled filaments were sorted out. d, e, Box dimension and angular F-actin (green) and the coiled-coil structure of tropomyosin (yellow) are 
distribution during three-dimensional refinement in side (d) and top visible. Scale bars in micrograph and class averages are 50nm and 10nm, 
(e) views. Histogram (few in blue to many in red) shows distribution respectively. 
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Extended Data Figure 2 | Resolution and model refinement of the F-actin filament core (c), the average resolution at the interface (d) and 
actomyosin complex. a, FSC curves of the cryo-EM reconstruction of the lower resolution in outer myosin parts (e). f, FSC curves of the model to 
F-actin-tropomyosin-myosin data set (blue) and the reprocessed data each half map to check for overfitting, when the model was only refined 


set of F-actin-tropomyosin (red). The average resolution (FSCp 143) of the versus the first half map. Black curve shows FSC between refined model 
final electron density maps (central parts, green in subfigures) is estimated and full map, when the model was refined against the full map (see 


at 3.9 A and 3.6 A, respectively. Next subfigures illustrate only the Methods). g, h, B-factor distribution of final model from low (blue) to 
actomyosin data set. b, Colour-coded local resolution of the full map and high (red) values. The absolute value strongly depends on the sharpening 
only finally refined part of the map (see Methods) estimated by ResMap*’. _ factor of the map, while the distribution shows the same gradient as the 
c-e, Representative regions with higher than the average resolution in the local resolution in b. 
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Extended Data Figure 3 | HLH motif bound to F-actin. a, Front view of 
F-actin and the HLH motif of the L50 domain of myosin show only small 
changes in loop regions while helices do not alter between weak (PPS 
state in purple, PDB accession number 5I4E) and strong binding (rigor 
state in red). The D-loop is moved towards the binding interface and is 
stabilized (A-state in yellow, M-state in green). Arrows indicate changes 
and scale bar is given. b, Same view as before shows the interface of myosin 
and F-actin in the rigor state. One possible salt bridge is highlighted with 
dotted lines. Surfaces are coloured from low (white) to high (yellow) 
hydrophobicity. c-e, Back view of the HLH motif and the base of loop 2 
bound to central (SD1, SD3) and adjacent (D-loop) F-actin subunits. 
Comparison of rigor (red) and PPS state (purple, PDB accession number 
514E) shows main differences (c). Final interaction of fully bound myosin 


is given in d, e. Possible electrostatic interactions are indicated by dotted 
lines. F-actin surface is depicted per subunit colour (c), by hydrophobicity 
(d) or electrostatic Coulomb potential (e, —10 kcal mol”! in red to 

+10 kcal mol“ in blue). In all subfigures, coloured residue labels belong 
to F-actin. f, Sequence alignment of myosin (H. sapiens myosin-ll, -I, -III, 
-V, -VI) in the region of the HLH (helix-R-loop-helix-S) motif. Important 
functions at the F-actin-myosin interface and roles in stabilizing these 
regions themselves are highlighted and labelled. Residue numbering refers 
to our published structure belonging to the sequence of NM-2C (depicted 
in bold type). Tissue localization of myosin-II is written in parentheses. 
We refer to the different myosin isoforms according to the nomenclature 
for the genes encoding the respective myosin heavy chains. 
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Extended Data Figure 4 | Cardiomyopathy loop and disease-causing 
mutations. a, Conservation of the CM-loop in the human myosin-II 

class is visualized as a model on F-actin (cyan) from low (white) to high 
(purple) conservation. b, Sequence alignment of the CM-loop region of 
the human myosin-II class. Important functions at the F-actin-myosin 
interface are highlighted and labelled. Residue numbering refers to our 
published structure belonging to the sequence of NM-2C (depicted in bold 
type). Tissue localization of myosin-II is written in parentheses. We refer 


F-actin interface 


natural variant 


to the different myosin isoforms according to the nomenclature for the 
genes encoding the respective myosin heavy chains. c, d, Mutations in 
8-cardiac myosin (MYH7) can lead to cardiomyopathies. Corresponding 
residues in 3-cardiac myosin are illustrated with our rigor state model (c) 
and known mutations*’~”* are listed (d). e, Table of known disease-causing 
mutations at the actomyosin interface****,”*-78, Numbers in parentheses 
give respective residue position in our published structure of NM-2C. 
Localization is described in parentheses. 
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Extended Data Figure 5 | Loop 2 and loop 3 on F-actin. a, b, Density 
map (grey) corresponding to the flexible part (residues 641-661) of loop 
2 (red). F-actin is shown as surface model coloured from low (white) to 
high (yellow) hydrophobicity (a) or electrostatic Coulomb potential 

(b, —10 kcal mol“! in red to +10 kcal mol“! in blue). Residue labels 
belonging to F-actin are coloured as surface colours. c, Sequence 
alignment of myosin (H. sapiens myosin-II, -I, -IH, -V, -VI) in the 
region of loop 2 and helix-R of the HLH region. Important functions 

at the F-actin-myosin interface and in stabilizing these regions are 


highlighted and labelled. Residue numbering refers to our published 
structure belonging to the sequence of NM-2C (depicted in bold). Tissue 
localization of myosin-I] is written in parentheses. We refer to the different 
myosin isoforms according to the nomenclature for the genes encoding 
the respective myosin heavy chains. d, e, Changes between rigor (red) and 
PPS state (purple) in the loop 3 region relative to the rest of lower 50-kDa 
domain when bound to F-actin. Movements are indicated by black arrows 
and scale bars are given. 
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Extended Data Figure 6 | Sequence-dependent interaction of 
supporting loop with the N terminus of F-actin. a, Surface of myosin and 
N terminus is depicted by electrostatic Coulomb potential (—10 kcal mol! 
in red to +10 kcal mol” in blue). Involved charged residues are labelled. 
b, Position of the proline-rich loop (supporting loop) located between 
relay helix and helix-R slightly differs between the PPS (purple, PDB 
accession number 5I4E) and rigor state (red) and shows no direct 
interaction with the N terminus of F-actin. Regions at the surface of SD1 
are pulled to the actomyosin interface indicated by an arrow and a scale 
bar is given (F-actin in A-state is depicted in yellow; F-actin in M-state is 
depicted in cyan). c, Sequence alignment of myosin (H. sapiens myosin-II, 
-I, -III, -V, -VI) in the region of the supporting loop. Different lengths of 
the loop and a possible supporting function are given in the last column. 
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PDB: 4PFO 
MYO6 


Sus scrofa 
Residue numbering refers to our published structure belonging to the 
sequence of NM-2C (depicted in bold). Tissue localization of myosin-II 
is written in parentheses. We refer to the different myosin isoforms 
according to the nomenclature for the genes encoding the respective 
myosin heavy chains. d~-g, Comparison of prominent properties in the 
supporting loop of different myosin classes (comparative models in 
purple) and their ability to undergo a direct interaction with the 
N terminus. Main differences are length of loop (numbers give absent amino 
acids relative to long loop) and position of the prominent positive-charged 
amino acid (R or K). Only an arginine or lysine sitting on the top would 
allow a direct interaction (e-g), while a sideward-oriented arginine (d) or a 
short loop (e) disables or reduces a possible interaction, respectively. 
In addition, respective densities (d) of the cryo-EM map are displayed. 
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Extended Data Figure 7 | Myosin-induced conformational changes in 
F-actin. a~c, Comparison of bare F-actin (A-state, yellow) with myosin- 
bound F-actin (M-state, cyan). Myosin is depicted in red. Either models 
(a) or representative parts of the electron density maps (b) illustrate 
conformational changes in F-actin (c). d, Sequence alignment of the 

N terminus region of human actin isoforms. Residue numbering refers 
to our published structure belonging to the sequence of non-muscular 
~1-actin (ACTG1, depicted in bold type). To prevent confusion, the 
gene names instead of protein names are given. Localization is written 


C373 used for 
pyrene labeling 


A-state F-actin Myosin binding M-state F-actin 
in parentheses. e, N terminus and nucleotide binding region of F-actin 
undergo small changes (highlighted with arrows) through transmitted 
force of N terminus pulling. f, g, Close-ups of structural changes at 

the nucleotide binding site. h, Coordination of ADP and Mg”" in the 
nucleotide binding cleft in M-state F-actin. i-k, Myosin binding induces a 
stabilization and shifting of the C terminus towards SD1 of F-actin. C373, 
which was used for pyrene labelling of F-actin, is part of the C-terminal 
region. Scale bars are given in the subfigures. 
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Extended Data Figure 8 | Comparison of rigor and rigor-like myosin 
structures. a—d, Close-ups of superimposed models from rigor-like 
structures (nucleotide-free myosin crystal structures in light green, PDB 
accession number 4PD3 (ref. 44) and 10E9 (ref. 27)) with our rigor 


cryo-EM structure (red). F-actin is shown as surface model (green, cyan). 


Illustrated domains are labelled and coloured, while the rest of myosin 
is shown in grey from the rigor state model. Most regions do not show 


Converter 


conformational differences (a, b), but the surface loops of myosin 
(CM-loop, loop 3 and loop 4) interacting with F-actin differ slightly in 
the rigor from the rigor-like structures (a, c). In contrast to the cryo-EM 
structure, loops at the interface (a, c) between F-actin and myosin are not 
always resolved in crystal structures. Major structural differences in the 
lever arm and converter regions are indicated by arrows and a scale bar is 
given (d). 


© 2016 Macmillan Publishers Limited. All rights reserved 


a Pre-powerstroke state 
USO bound 


L50 (bottom view) 


b Pre-powersiroke state 
L50 bound 


USO (top view) 


L50 (bottom view) 


c Pre-powerstroke state 
weak - no strong interfaces 


é 
$ 
5 
3 
8 
S 
E) 


d Flexible loop 2 
PPS s0au50 


P)-release state 
USO bound 


No major No major 
changes changes 
L50 rotates 
Cleft closes LSO rotates back 


Clash with actin and binds to actin 


Rigor state 
L50/U50 bound 


P/-release state 
L50 bound 
USO rotates 
and binds to actin 


U50 rotates back Cleft closes 


No major No major 
changes changes 
Pj-release state 
stronger - L50 bound 
USO rotates and 
binds to actin 
Cleft closes 

No major 

changes 


L50 rotates and 


binds to actin 
Cleft closes 

No major 

changes 

e Base stabilization of loop 2 f 


PPSis50au50 —->RSi50PPSuso 


Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | Different alignments of models for weak 

to strong binding of myosin and strut attraction to the base of loop2 
promotes cleft closure. a-c, Three possible alignments of myosin in 

the PPS (first column, purple, PDB accession number 514E), P;-release 
(second column, blue) and rigor (third column, red) states are illustrated 
with respect to F-actin. For better visualization, differences in F-actin 
are not shown and F-actin is only depicted in the M-state (green, cyan). 
The Pj-release state represents a homology model of NM-2C based ona 
crystal structure of myosin in the P;-release state” (PBD accession number 
4PFO, see Methods). All models are either aligned to the U50 domain (a) 
or the L50 domain (b) of the rigor state. In c, the model of the Pj-release 
state was first aligned to the L50 domain of the rigor state. The PPS state 
was then aligned to the U50 domain of model of the Pj-release state. The 
first row in each subfigure shows changes in the U50 domain from the 
top (for a better visualization L50 was deleted). The second row shows 


the L50 domain from the bottom (U50 is transparent). Possible clashes 
are indicated by a yellow star (a). d-f, Binding mechanism of the strut, 
connecting L50 and U50 domains, to the stabilized base of loop 2. To 
illustrate the conformational changes, the respective regions in the PPS 
state (PPS, purple, PDB accession number 5I4E) and rigor state (RS, red) 
of myosin have been partly overlaid. The rest of myosin is shown in grey. 
L50 binds to F-actin (A-state, yellow) (d, e). The base of loop 2 is stabilized 
by F-actin (e) and attracts the negatively charged strut with its positive 
patch. This promotes the binding of the strut, shifting the equilibrium to 
a closed conformational state of myosin (f). Flexible parts of loop 2 are 
indicated as dotted lines. Lower panels show surfaces of the same regions 
as in the upper panels coloured by electrostatic Coulomb potential. For 
better visualization, the upper parts of the strut were removed. Surface of 
F-actin is depicted in transparent grey. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 


Magnification X122,270 

Defocus range (um) 0.7-2.8 0.8-2.6 

Voltage (kV) 300 

Microscope Titan Krios 

Camera Falcon 2 

Frame recording time (s) 0.085-0.475 

Number of frames 7 

Electron dose (e/A’) 16 

Pixel size (A) 11 
Particle statistics 

Box size (px) 256 

Boxing distance (px) 29 

Rise* (A) 27.5 

Azimuthal rotation” (°) 166.9 

Particles 118,000 91,000 
Model composition F-actin-myosin _ Bare F-actin 

Non-hydrogen atoms 26,477 14,450 

Protein residues 3,354 1,845 

Ligand (ADP/Mg**) 135/5 135/5 
Refinement 

Resolution (A) 3.9 3.6 

Map sharpening b factor (A?) -200 -200 

Average B-factor (A?) 180 98 

R factor 0.34 0.33 

Fourier Shell Correlation 0.84 0.83 
Rms deviations 

Bonds (A) 0.015 0.013 

Angles (°) 1.83 1.74 
Validation 

Molprobity score 2.24 1.82 

Clashscore, all atoms 14.16 8.93 

Poor rotamers (%) 1.57 1.36 
Ramachandran plot 

Favored (%) 93.23 96.38 

Allowed (%) 5.14 3.07 

Outliers (%) 1.63 0.55 


*Helical symmetry parameters were estimated after C1 refinement (see Methods for further details) 
Refinement statistics are given after the last step of refinements of the actomyosin and bare F-actin data set. Rms, root mean square. 
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REVISION THEORY 
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hemba reread what hed written. 
T “T must know if this will work. It’s 

11:35 p.m., 12 Sept. 2015. I’m in the 
garage. In 2 minutes, I’m going to open the 
third drawer down on the left side of my 
workbench.” 

That was it. 

There were two possibilities. 

One: the drawer would remain as empty 
as it had been since the day hed bought the 
bench. That wouldn't necessarily mean that 
the device didn’t work, but it would be dis- 
appointing. 

Two: the drawer would no longer be 
empty. 

He folded the note carefully and 
focused on what he intended to do with 
it. He would place it in a white envelope, 
address it to himself, and deposit it in 
the red and blue postbox next to the 
notice board at the supermarket. 

He waited. 

11:37:59 p.m. 

Themba opened the drawer. Inside lay a 
yellowed envelope. 

He trembled as he retrieved the letter 
and opened it. There, written in a faded 
blue script he didn’t recognize, was a simple 
response: 

“Yes, it works.” 

Themba sat, stunned for a moment. 
Then, a wave of relief passed over him. 
Graduate school, the disastrous postdoc, 
the humiliation of taking a job in industry 
while his peers were getting their own labs. 
None of that meant anything now. Not next 
to this. 

“It works,’ he said. And then, correcting 
himself: “It’s going to work? 

He glanced at the drawer and turned 
his attention back to the letter. Hed seen it 
before, just after theyd delivered the work- 
bench. He remembered opening the drawer 
a hundred times since and seeing the letter 
just lying there. Why hadn’t he read it, he 
wondered, or just thrown it away? 

This was to be expected when reweaving 
history, he thought. Every moment from 
the intervention to the present is affected — 
and the whole course of the future as well. 
These werent false memories, hed really 
seen the letter in the drawer before and yet, 
in another sense, he couldn't have. 

He picked up his own note and added: 

“PS I’ve been struggling with counter- 
factual dampening, don't tell me how to do 
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Wat] 


. <a 


Time to take stock. 


it, but am I on the right track? Is itevena 
problem?” 

The letter in the drawer had always read: 

“T don't know anything about ‘counter- 
factual dampening’ but, yes, your machine 
works.” 

Had the letter changed? He remembered 
adding the PS just a moment ago, but he was 


| 


certain that the response had always men- 
tioned the dampening issue. And who could 
be using the machine without even recog- 
nizing one of the most fundamental terms 
in temporal and possible world mechanics? 

He added to his note: 

“PPS Who is this?” 

And the response from the drawer had 
always read: 

“Dear Dad, this is Pat. I don’t know any- 
thing about ‘counterfactual dampening; but, 
yes, your machine works.” 

Pat. Patricia? His little girl, whom he'd 
bathed and changed just hours earlier. 
Whom hed kissed goodnight before sneak- 
ing out to his workshop. 

Thema again added to his note. 

“PPPS Pat. Do I know youre writing to 
me?” 

And Patricia's response had always read: 

“Dad, you don't know that I’m doing this, 
but I wanted to tell you that, yes, it works. I 

don't know anything 


> NATURE.COM about ‘counterfactual 
Follow Futures: dampening’, sorry. 
WY @NatureFutures I love you and miss 


Ei go.naturecom/mtoodm = you. Patricia.” 
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Themba stared at the note for a long 
while. Something about the tone bothered 
him. 

He crossed out the line hed just added and 
wrote: 

“PPPS Why don't I know that you're writ- 
ing this to me?” 

And Patricia's response had always read: 

“Dear Dad, I miss you so much. I sup- 
pose you must have been on the right track 
with the ‘counterfactual dampening’ issue 
because, yes, your machine works. I’ve spent 
along time trying to decide how, and even 
whether, I should answer the last question 

in your note. We missed you so much 
during those early years, but Mom said 
your work was important, that it could 
change everything. Once you'd shown 
that you could move things through time, 
we thought we’ have you back. But it 
just got worse. You said there were still 
problems with the device, and we saw 
less of you then. Mom and I left. Miriam 
found you. It was a heart attack. Your note 
was among your papers. I haven't told 
Mom aboutit. It would make her too sad. 

Ilove you. 

Patricia” 

He'd been crying from the moment hed 
first opened and read Patricia's letter. There 
was no relief or joy in this success, not if it 
cost him his family and, ultimately, his life. 

He disconnected the device and placed 
it in the open drawer. He was done with it. 

Then he picked up the letter hed written, 
the letter that would eventually end up in 
Patricia's hands, and tore it to pieces while 
focusing on his intention never to mail it. 

He wanted to hold his daughter and his 
wife. 

He wanted to meddle with time no longer. 

He threw the scraps of paper into the 
small red wastebasket next to the bench. 

He reached for Patricia’s response but it 
was no longer there — it had never been 
there — and Themba was left with his hand 
suspended in mid-air with the strongest 
feeling that he was about to do something 
important but had no idea what it was. 

He supposed it had something to do 
with his work and so he opened the drawer, 
retrieved the device, and began running 
diagnostics. There was still much to do. m 
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